Hi!
I would like to know the correct way to copy production data into a staging environment which can be used for all downstream tasks.
We are using GCP & Bigquery and we have the two environments as separate GCP projects. Currently I have a conditional in my sources, which copies all the prod data over.
I recently saw dbt clone and would like to leverage this instead but cannot find a complete guide on how to best set this up. Ideally this setup would also include how to the execute CI tests against any new PRs into the dbt models.
Thanks in adance!
Best,
D
Hi, I was wondering to do the same thing but couldn’t find the example to do so,
From this following blog. I believe it can be tested to try on:
- Testing staging datasets in BIIn this scenario, we want to:
- Make a copy of our production dataset available in our downstream BI tool
- To safely iterate on this copy without breaking production datasets
Therefore, we should use clone in this scenario.
But there are no information regarding the breakdown on the step by step of this. Any help will be appreciated!
Hello,
I think using BigQuery data transfer service or scheduled queries to automate the data copy process, replacing your current conditional method. Install and configure the dbt clone package by defining source and target datasets in the dbt_project.yml file. For executing CI tests, set up a CI/CD pipeline with tools like GitHub Actions or GitLab CI, ensuring it runs dbt test against the staging environment for any new PRs.