When working with a Directed Acyclic Graph (DAG) in dbt (Data Build Tool), there are instances when you might want to refactor a part of it. For example, consider tables A and B, where table B references table A. If you decide to change a column name in table A, you’ll also need to update the corresponding references in table B.
Before Refactoring
A.sql
select val1 as foo from source
B.sql
select foo, count(*) n from {{ ref("A") }} group by 1
After Refactoring
A.sql
select val1 as boo from source
B.sql
select boo, count(*) n from {{ ref("A") }} group by 1
The general development process usually involves making these changes in your local workspace, previewing them, and then submitting them for review via Git before deploying through CI (Continuous Integration).
However, one of the challenges with dbt’s dependency resolution is that ref(“A”) points to the production version of table A, not the version in your workspace. This necessitates making changes separately for each environment.
Contrastingly, if you’re using a tool like Looker with its Persistent Derived Tables (PDT), you can achieve this task quite effortlessly by utilizing its DEV mode, which resolves dependencies flexibly based on the context.
I’m interested in learning ways to accomplish something similar in dbt. How can we refactor parts of a DAG more efficiently while still making sure everything integrates smoothly?