In late May, Kristin Bagnall, Fivetran’s dbt Package Lead, gave an office hours session focused on how to write and distribute dbt packages for the dbt platform. We’ll look at Kristin’s session and also focus on the numerous Fivetran dbt packages that have been published to date.
What is a dbt Package?
Packages are dbt projects that you can install in your own dbt project.
There are two main use cases for packages.
First, we’ll start with common staging models for a single data source. An example of this is Fivetran’s dbt package for Netsuite. Models are great for one particular source data or for sources that have industry-standard metrics or any code that analysts are re-creating
Secondly, we have packages that serve as macros, such as dbt-utils, which will help you with any complex SQL querying. So for example, in dbt-utils, there’s a macro that you can pivot some data from rows to columns, which is quite helpful.
To see all available dbt packages - be sure to check out the dbt Hub
How We Write Our dbt Packages
Let’s walk through the steps we follow at Fivetran for building a new dbt package from scratch.
Research Customer Needs
The first step is to interview the stakeholders of the source data and find out:
- What they’re doing with their data
- How they’re modeling it
- Any pain points they’ve encountered when modeling data.
In our case, at Fivetran, the whole goal of creating these packages is to make customers’ and analysts’ lives easier so that they can better understand their data and get answers quicker.
Understand source application and schema
Once we’ve gained enough feedback, we try to imagine what can be created with a data source that involves little customization with the intent to generate the query in a way that it will work accurately for everyone, so that’s pretty important.
So generally when we talk to customers, if they have some customization and they’re building this really specific model, it’s not a good candidate for a package model. It’s generally something that they’ll have to write on their own.
Draft Queries and Test
The most time-intensive part of the process is drafting queries and testing those queries. It’s important to QA the queries as much as possible and ensure that resulting data matches what’s in your reporting tool, if it’s relevant to your package and process.
Build The Package
Now, it’s time to convert SQL queries to models. In our case, we use dbt-utils on all of the packages so far for the cross-database macros. All of the packages are compatible across Redshift, BigQuery, and Snowflake. To make this possible, we utilize some of the macros that Claire and the Fishtown team have built.
Lastly, but we add documentation to all of the source tables that are coming in through Fivetran and also the transformed models. In our case, we add descriptions to tables, columns and so on.
Test, Launch and Solicit Feedback
We also add testing, so for example, dbt has a non-null and a unique test you can run on any column of data and we do this for all primary keys. We do that for all the primary keys, and they also have a couple of other tests that are pretty helpful.
Once we’re done, we ping the dbt Slack channel and gain beta testers. Through dialogue and testing, we make revisions and ultimately post it on the dbt Hub.
Looking At Fivetran’s dbt Packages
Fivetran has published numerous dbt packages and will continue to bring additional resources for our customers. Our packages will take care of basic table standardization, add data tests and enable you to start building visualizations immediately.
So far, Fivetran has built and published the following packages:
- Fivetran dbt package for NetSuite
- Fivetran dbt package for Mailchimp
- Fivetran dbt package for Salesforce
- Fivetran dbt package for GitHub
- Fivetran dbt package for Marketo
- Fivetran dbt package for Asana
Get Started Today
To learn more about package building and Fivetran’s approach, watch Kristin’s full dbt Office Hours session.
We welcome you to get going and try these packages for yourself!