An Approach to Building dbt Packages

jasonharris · July 21, 2020, 3:11pm

In late May, Kristin Bagnall, Fivetran’s dbt Package Lead, gave an office hours session focused on how to write and distribute dbt packages for the dbt platform. We’ll look at Kristin’s session and also focus on the numerous Fivetran dbt packages that have been published to date.

What is a dbt Package?

Packages are dbt projects that you can install in your own dbt project.

There are two main use cases for packages.

First, we’ll start with common staging models for a single data source. An example of this is Fivetran’s dbt package for Netsuite. Models are great for one particular source data or for sources that have industry-standard metrics or any code that analysts are re-creating

Secondly, we have packages that serve as macros, such as dbt-utils, which will help you with any complex SQL querying. So for example, in dbt-utils, there’s a macro that you can pivot some data from rows to columns, which is quite helpful.

To see all available dbt packages - be sure to check out the dbt Hub

How We Write Our dbt Packages

Let’s walk through the steps we follow at Fivetran for building a new dbt package from scratch.

Research Customer Needs

The first step is to interview the stakeholders of the source data and find out:

What they’re doing with their data
How they’re modeling it
Any pain points they’ve encountered when modeling data.

In our case, at Fivetran, the whole goal of creating these packages is to make customers’ and analysts’ lives easier so that they can better understand their data and get answers quicker.

Understand source application and schema

Once we’ve gained enough feedback, we try to imagine what can be created with a data source that involves little customization with the intent to generate the query in a way that it will work accurately for everyone, so that’s pretty important.

So generally when we talk to customers, if they have some customization and they’re building this really specific model, it’s not a good candidate for a package model. It’s generally something that they’ll have to write on their own.

Draft Queries and Test

The most time-intensive part of the process is drafting queries and testing those queries. It’s important to QA the queries as much as possible and ensure that resulting data matches what’s in your reporting tool, if it’s relevant to your package and process.

Build The Package

Now, it’s time to convert SQL queries to models. In our case, we use dbt-utils on all of the packages so far for the cross-database macros. All of the packages are compatible across Redshift, BigQuery, and Snowflake. To make this possible, we utilize some of the macros that Claire and the Fishtown team have built.

Add Documentation

Lastly, but we add documentation to all of the source tables that are coming in through Fivetran and also the transformed models. In our case, we add descriptions to tables, columns and so on.

Test, Launch and Solicit Feedback

We also add testing, so for example, dbt has a non-null and a unique test you can run on any column of data and we do this for all primary keys. We do that for all the primary keys, and they also have a couple of other tests that are pretty helpful.

Once we’re done, we ping the dbt Slack channel and gain beta testers. Through dialogue and testing, we make revisions and ultimately post it on the dbt Hub.

Looking At Fivetran’s dbt Packages

Fivetran has published numerous dbt packages and will continue to bring additional resources for our customers. Our packages will take care of basic table standardization, add data tests and enable you to start building visualizations immediately.

So far, Fivetran has built and published the following packages:

Get Started Today

To learn more about package building and Fivetran’s approach, watch Kristin’s full dbt Office Hours session.

We welcome you to get going and try these packages for yourself!

emilie · July 22, 2020, 1:03pm

Just want to piggy back here and highlight a couple of wonderful community packages:

At GitLab, we put together a package to help provide visibility into our Snowflake costs. The package is here: https://gitlab.com/gitlab-data/snowflake_spend You can see it live in our dbt project: https://gitlab.com/gitlab-data/analytics/-/tree/master/transform/snowflake-dbt @tmurphy and I welcome any MRs to make it better!
@claus has put together a number of awesome dbt packages including:
- dbt-extend: https://github.com/calogica/dbt-extend
- dbt-date: https://github.com/calogica/dbt-date
- dbt-test: https://github.com/calogica/dbt-test
I don’t know the best person to tag, but I’ve heard about people using the dbt_ml package to work with BigQuery with great success: https://github.com/kristeligt-dagblad/dbt_ml (This one is on dbt hub, but extra shoutout since it is community contributed.)

I think one of the best things about an open source project is that the community can contribute to it in multiple ways, including these, which is less about contributing to dbt directly and about contributing to how you use it.

Are there any other dbt packages floating out there that aren’t on dbt hub?

claus · July 22, 2020, 2:20pm

@emilie thanks for reminding me that I need to get these packages cleaned up and better documented. Too many work projects interfering with the side projects!

Topic		Replies	Views
How we (used to) structure our dbt projects Archive	22	196575	September 7, 2022
dbt package for multiple sources Help bigquery , dbt-cloud	2	1660	July 4, 2023
Just started with DBT Cloud, issue with Packages Archive	0	1977	January 11, 2021
Your dbt Project Checklist Archive	3	19560	February 3, 2023
Writing packages when a source table may or may not exist Show and Tell	1	32337	December 28, 2020