Hello,
I’m currently working to convert legacy sql build scripts into dbt models. Most of the scripts start out with a simple CREATE TABLE xxx AS SELECT <columns> FROM table
statement and then run subsequent UPDATE
statements to make changes to the values of the table.
I’m currently handling this in dbt by packaging each update into its own CTE, e.g.:
WITH source AS (
SELECT * FROM ref('source')
),
fix_col1_value AS (
SELECT
{{ dbt_utils.star(ref('source'), except=["col1"]) }},
CASE
WHEN othercol = 'val' THEN 'newval'
ELSE col1
END AS col1
FROM
source
),
fix_col2_value AS (
SELECT
{{ dbt_utils.star(ref('source'), except=["col2"]) }},
CASE
WHEN someothercol = 'val' THEN 'newval2'
ELSE col2
END AS col2
FROM
fix_col1_value
),
...
final AS (
SELECT * FROM fix_colX_value
)
SELECT * FROM final
Is this a good approach for handling something like this? I do like that the individual updates from the old script are still separated out as CTEs rather than being jumbled together into one SELECT
statement.
The code itself feels a bit long with a lot of repeated {{ dbt_utils.star() }}
calls. It also feels potentially error prone if someone adds a new CTE and forgets to update the chain of CTEs.
Thanks,
Eric