The problem I’m having
My org recently migrated from one source data to another. This resulted in V1 and V2 versions of various data pipelines that have slightly different formats. V1 has data before [migration date] and V2 has data on or after [migration date]. I am trying to understand the best way to deprecate the V1 models, while still preserving V1 data and unioning it to the V2 table for reporting.
The context of why I’m trying to do this
My org would still like to do historical reporting before [migration date], ideally all in one table.
What I’ve already tried
We have tried to do the following:
V1.sql (table)
V1_to_v2.sql (table) which contains all the necessary translations of the final output to match V1 and V2.
V2.sql (incremental)
After we have the above tables, we need to union the data. The approach we used was:
Run V1_to_v2, rename the table in BQ to “V2”, then run the incremental V2.sql everyday like normal. The concern is someone will try to do a full-refresh on this table and will return data only on or after [migration date].
I am looking for suggestions on this strategy.
I definitely want to archive the v1.sql. From here, do i need a snapshot of V1_to_v2? Should I set V1_to_v2 as a source in the v2.sql file? If i archive the V1.sql and try to use {{ref(‘V1’)}} in V2.sql, i get an error that the node doesnt exist.
Any recommendations are appreciated!