Sampled development environments in dbt

sjwhitworth · May 8, 2019, 12:42pm

Does anyone have any design patterns for building sampled development environments in dbt? This is so analysts can test things out quickly against production, without being slowed down by the large size of the dataset in prod. I’m currently just doing something like the following (sample a random 10%, up to a variable amount), but it looks a little messy:

{% if target.name == "dev" %}
    AND MOD(CAST(RAND()*10 AS INT64),10)=0
    LIMIT {{ var("max_results_in_dev") }}
{% endif %}

In general, I’d love to see any guidance or wisdom people have around the intricacies of introducing dbt into teams / production settings, et al.

tristan · May 8, 2019, 2:02pm

I’ve seen companies effectively deploy your strategy @sjwhitworth, but it can require a lot of work. Overall here are my suggestions (feel free to ignore if you find another way that works for you!):

keep all dimension data. don’t leave out certain customers, etc. this is onerous both to write and maintain: it requires a lot of sprawl-y code to make things consistent everywhere. it’s also hard to test.
limit large fact tables to a configurable # of days prior to current_timestamp(). configure via var().
apply all filtering within base models, and probably materialize these base models as tables so that you’re not executing the where clause over and over again. this depends on what your DAG looks like though.
try to implement filtering on top of efficient keys depending on your data store. partitions in bigquery, sorts in redshift, etc.

Our just-released best practices guide shares code on how to implement #2, above.

Hope that’s helpful!

Topic		Replies	Views
DBT Cloud dev, stg, prod configurations Help best-practice , environments , bigquery , dbt-cloud , dbt_project-yml	0	172	October 30, 2024
New Package: dbt-meta-testing Show and Tell	1	3088	December 7, 2020
What are the dbt commands you run in your production deployment of dbt? Archive	12	30159	March 30, 2021
Dbt dev/prod data pipelines In-Depth Discussions data-quality , best-practice , environments , adopting-dbt	8	3396	January 16, 2024
Seeking Advice on Streamlining Data Models in dbt Help	1	122	August 3, 2024

Sampled development environments in dbt

Related topics