Production rollbacks

boxysean · May 14, 2021, 4:23pm

Hi community!

I am doing some thinking on how to do “production rollbacks”. I’ll define a production rollback as an operation that can be executed to restore a data warehouse to a known correct state quickly.

(Think of a scenario when some data goes out to production, happens to be wrong, someone notices, and the operations team wants to restore the data to a good state quickly, so they can debug the issue in dev and QA environments.)

Some other discussions I’ve found in the community on this topic:

I was speaking with a dbt user today and we came up with this algorithm with dbt Cloud and Snowflake using the Snowflake Time Travel feature:

In the on-run-start hook of a dbt Cloud production job, record the latest query ID or timestamp in a Snowflake table called rollback.
Execute the dbt Cloud production job.
Observe an issue.
Execute a dbt Cloud rollback job, that, in a run-operation:
a. Reads the latest entry in the rollback table.
b. Clones the historical schema or database (link) of the previous production data.
c. Swaps the cloned schema or database into production.

I haven’t tried this out myself yet, but itching to try it out.

Curious to hear if anybody else has algorithms they use?

boxysean · July 18, 2021, 8:37pm

Alright, I was noodling on this with an esteemed colleague at dbt Labs and they think the following algorithm is a winning strategy for Snowflake:

create a production job which:

clones the production database/schema into a new schema (run-operation)

runs the dbt run against the cloned schema

swaps the new schema with the old production schema (run-operation)

If anything goes awry, they can run another job to swap the old and new databases/schemas and effectively perform a rollback.

matt.winkler · July 21, 2021, 10:44pm

@boxysean If I’m understanding these workflows correctly, your original proposal is more of a table / view level approach, and the second proposal is a schema / database level swap. Feels like the second approach is more scalable with how dbt jobs are designed to populate many tables in a schema / database in a single run. Do you have any additional thoughts?

Topic		Replies	Views
(near) zero downtime deployments of updated models (an alternative to virtual data environments?) In-Depth Discussions	0	1351	April 26, 2023
dbt build rollback In-Depth Discussions testing , orchestration-and-deployment , cli , dbt-core	2	3441	August 7, 2024
Understanding how DBT handles tables for no down time Help	4	1996	October 31, 2022
Dbt jobs Export Archive	4	3742	December 25, 2021
Clean your warehouse of old and deprecated models Show and Tell	15	29969	November 27, 2023

Production rollbacks

Related topics