Hi community!
I am doing some thinking on how to do “production rollbacks”. I’ll define a production rollback as an operation that can be executed to restore a data warehouse to a known correct state quickly.
(Think of a scenario when some data goes out to production, happens to be wrong, someone notices, and the operations team wants to restore the data to a good state quickly, so they can debug the issue in dev and QA environments.)
Some other discussions I’ve found in the community on this topic:
I was speaking with a dbt user today and we came up with this algorithm with dbt Cloud and Snowflake using the Snowflake Time Travel feature:
- In the
on-run-start
hook of a dbt Cloud production job, record the latest query ID or timestamp in a Snowflake table calledrollback
. - Execute the dbt Cloud production job.
- Observe an issue.
- Execute a dbt Cloud rollback job, that, in a run-operation:
a. Reads the latest entry in therollback
table.
b. Clones the historical schema or database (link) of the previous production data.
c. Swaps the cloned schema or database into production.
I haven’t tried this out myself yet, but itching to try it out.
Curious to hear if anybody else has algorithms they use?