My team is currently looking to use the unit test feature in dbt 1.8+ on our models. But most of our models are not trivial with only several tables join. Our models contain a lot of CTE and more than 10 tables in most cases. This makes set up the input for a simple unit test that only wants to verify a small portion of the sql really challenging and time consuming. If we follow the clean code principle in the software programming , we should really break up the models into smaller ones that only contain a few tables (less than 3?) and a few lines of sql (less than 10?). It will make unit tests much easier to write. Do we have a best practice on how to break up the models into smaller ones? What is the recommended way to do it? I first thought about define those CTE as a ephemeral models. But then the dbt document mentions that overuse of ephemeral materialization can also make queries harder to debug. The other problem with splitting the models into smaller ones are that we are going to multiple the total number of model files from tens to hundreds. And it also will make reading code harder since we need to jump through multiple models to understand the complete logic of a data build logic. We definitely need some best practice to manage the organization of the models. What is your recommendation for this situation and how your project address the concerns?
Related Topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Should I move my CTEs into separate models? | 1 | 7862 | June 21, 2023 | |
Repeated CTEs, DRY and intermediate models good practice | 9 | 4409 | May 4, 2023 | |
Do you test ephemeral models? | 3 | 6134 | May 2, 2024 | |
run execution by cte in a model - ideas/suggestions | 2 | 761 | October 27, 2022 | |
Dbt model (think unit) tests POC | 2 | 5545 | August 24, 2021 |