If your organization has a large dbt project, and you need to run dbt with a new target schema (for example, a new employee joined and wants to start contributing. Or, you borked your own target schema and want to start from scratch), here’s a handy macro to help you out.
Note: it only works on Snowflake, and assumes that you don’t use mutliple schemata.
{% macro clone_prod_to_target(from) %}
{% set sql -%}
create schema if not exists {{ target.database }}.{{ target.schema }} clone {{ from }};
{%- endset %}
{{ dbt_utils.log_info("Cloning schema " ~ from ~ " into target schema.") }}
{% do run_query(sql) %}
{{ dbt_utils.log_info("Cloned schema " ~ from ~ " into target schema.") }}
{% endmacro %}
{% macro destroy_current_env() %}
{% set sql -%}
drop schema if exists {{ target.database }}.{{ target.schema }} cascade;
{%- endset %}
{{ dbt_utils.log_info("Dropping target schema.") }}
{% do run_query(sql) %}
{{ dbt_utils.log_info("Dropped target schema.") }}
{% endmacro %}
{% macro reset_dev_env(from) %}
{#-
This macro destroys your current development environment, and recreates it by cloning from prod.
To run it:
$ dbt run-operation reset_dev_env --args '{from: analytics}'
-#}
{% if target.name == 'dev' %}
{{ destroy_current_env() }}
{{ clone_prod_to_target(from) }}
{% else %}
{{ dbt_utils.log_info("No-op: your current target is " ~ target.name ~ ". This macro only works for a dev target.", info=True) }}
{% endif %}
{% endmacro %}
Then, you can go about working on the models you want to develop without having to rebuild from scratch.
We also use the clone_prod_to_target macro as an operation in our CI runs to make them faster – by cloning our prod schema, our incremental models only process new data.
@claire What would be the best way to drop the schema at the beginning of the run, but then clone the schema at end of the run (from within the macro)? Is that even possible or practical?
I’m not quite sure what you’re trying to achieve here — can you let me know why you’re looking to implement this pattern (it might help me give better advice!). On the surface, this is certainly possible through the use of hooks and/or operations
@claire Sorry for the confusion. I wanted to implement the same process you were, but with a different pattern where the destruction of the dev env and the zero-copy-clone would happen separately. This was for a staging environment we’re creating and I wanted to make it as modular and dynamic as possible to ensure portability and reusability. I decided to just call the separate macros on start and end and removed the need for the third macro that runs the other two when run from the dbt CLI. I think that’s probably the best way for us at the moment, but it seemed like one of the only ways.
This is great idea. I created a macro based on @claire code. It would not run since I don’t have dbt_utils package installed (I have not been able to get it installed). So I did remove the code that reference dbt_utils references. I was able to execute the dbt run-operation however; it did not create the clone. There are no errors yet the clone does not get created in Snowflake. Am I missing something?
* This macro is to create a clone from the DEV_FIVETRAN database inside Snowflake
WAREHOUSE defaults to DBT_TRANSFORM_COMPUTE_XS
USER_ROLE defaults to DATA_ENGINEER_ROLE
Clone name defaults to DEV_FIVETRAN_ developer initials
Any previous clones with this name will be replaced.
dbt run-operation developer_clone_creation --args '{INITIALS: WMM}'
*/
{% macro developer_clone_creation (INITIALS) %}
USE WAREHOUSE DBT_TRANSFORM_COMPUTE_XS;
USE ROLE DATA_ENGINEER_ROLE; -- or whatever roles need to use the clone
CREATE OR REPLACE DATABASE DEV_FIVETRAN_{{[INITIALS]}} CLONE DEV_FIVETRAN;
--Special grants for clone access
GRANT USAGE ON DATABASE DEV_FIVETRAN_{{[INITIALS]}} TO ROLE DATA_ENGINEER_ROLE ;
{%endmacro %}