I’m working on a project for (near) zero downtime dbt deployments of updated models. Curious to know if anyone worked on a similar idea!
current problem
we run dbt buillds every hour, but some of our model changes require full refreshes and other expensive operations that can take >1 hour
to avoid data freshness problems, we need a solution that lets us update our models with a new version while the old version is still running, and when the new version is materialized, we want to switch it with the old one
solution idea
(this is slightly inspired by the tobiko data article about virtual data environments)
we’re trying to leverage snowflake’s zero-copy cloning to create “data branches” with clones of the modified models and all its children models, and deferred references of the parent models pointing to prod
the ideal workflow looks like:
- edit models on dev
- create branch of
dbt ls --models dited_models+ --defer prod
using snowflake zero-copy cloning - run dbt full refreshes on the new branch to update the models
- “merge” the new branch into prod by dropping the prod tables and renaming the new branch to the prod namespace (in the same transaction)