Strategies for Efficiently Refactoring Parts of a DAG in DBT

hiu-naoki · September 1, 2023, 5:37am

When working with a Directed Acyclic Graph (DAG) in dbt (Data Build Tool), there are instances when you might want to refactor a part of it. For example, consider tables A and B, where table B references table A. If you decide to change a column name in table A, you’ll also need to update the corresponding references in table B.

Before Refactoring

A.sql

select val1 as foo from source

B.sql

select foo, count(*) n from {{ ref("A") }} group by 1

After Refactoring

A.sql

select val1 as boo from source

B.sql

select boo, count(*) n from {{ ref("A") }} group by 1

The general development process usually involves making these changes in your local workspace, previewing them, and then submitting them for review via Git before deploying through CI (Continuous Integration).

However, one of the challenges with dbt’s dependency resolution is that ref(“A”) points to the production version of table A, not the version in your workspace. This necessitates making changes separately for each environment.

Contrastingly, if you’re using a tool like Looker with its Persistent Derived Tables (PDT), you can achieve this task quite effortlessly by utilizing its DEV mode, which resolves dependencies flexibly based on the context.

I’m interested in learning ways to accomplish something similar in dbt. How can we refactor parts of a DAG more efficiently while still making sure everything integrates smoothly?

Topic		Replies	Views
More time coding, less time waiting: Mastering defer in dbt In-Depth Discussions devblog	3	876	June 10, 2024
Dynamically re-point refs to avoid building upstream models Help best-practice , dbt-core	0	886	September 27, 2023
Refactoring legacy SQL to dbt Help dbt-core	0	487	February 1, 2024
Dbt and liquibase Archive	2	5434	September 27, 2022
Converting legacy sql build scripts into models Archive	0	2179	August 3, 2021

Strategies for Efficiently Refactoring Parts of a DAG in DBT

Related topics