How do you zero-copy-clone a Snowflake schema when the target schema uses a custom schema?

Hello all. I’m just a couple weeks into DBT (or is it dbt?) and I’m really very impressed.

I have a question for the group.

At my company we run our CI/CD through on-prem Azure DevOps and I’m setting up the pipeline. We use Snowflake and I’ve noticed some people on this forum mentioning their use of zero-copy clones in their Dev or QA environments.

I’m also using custom schemas so that each user can create their own schemas in development and QA.

I’m not sure the best way to do a zero-copy clone of my production BI schema. And I’m not sure where to put the logic. I’d like to do something like this below, but can’t quite get the syntax down and I’m not sure on-run-start is the appropriate place for it, and I’m not sure I can run generate_schema_name in the project.yml.

on-run-start:
- "if target.name != 'prod' then create or replace schema {{ generate_schema_name() }} clone bi_db.production_bi_schema;"

Any help would be appreciated. I assume once I get the logic down I can put it into a macro, but I’m just looking for any help. Thanks so much! Great community.

I figured out a way of doing it:

on-run-start:
    - '{{ clone_schema("bi_db.production_bi_schema") if var("reclone",false) == true }}'

When I run dbt, from the command line, I provide reclone as a var, indicating whether I want to re-clone this schema.

The clone_schema macro looks like this:

{% macro clone_schema(schema_to_clone) -%}

    {% set all_tables_query %}
        show tables in schema {{ schema_to_clone }}
    {% endset %}

    {% set results = run_query(all_tables_query) %}

    {{ "create or replace schema " ~ generate_schema_name(var("custom_schema")) ~ ";" }}

    {% if execute %}
        {% for result_row in results %}
            {{ log("create table " ~ generate_schema_name(var("custom_schema")) ~ "." ~ result_row[1] ~ " clone " ~ schema_to_clone ~ "." ~ result_row[1] ~ ";") }}
            {{ "create table " ~ generate_schema_name(var("custom_schema")) ~ "." ~ result_row[1] ~ " clone " ~ schema_to_clone ~ "." ~ result_row[1] ~ ";" }}
        {% endfor %}
    {% endif %}
{%- endmacro %}

You could also clone views, but I don’t use views. Because we decide whether to clone at the command line through vars I can set it in my build pipeline appropriately.

I actually started out just cloning the entire schema in one statement. The problem is that the role doing the cloning becomes the owner of the schema, but not of any of the tables, which retain their original privileges. But if you clone each item one-by-one, then the role you’re running under is the owner of all the objects.

It sounds like you’re unblocked @trevor but I wanted to offer a couple of additional resources for how we do this at GitLab.

This is how we clone the database: https://gitlab.com/gitlab-data/analytics/-/blob/master/orchestration/manage_snowflake.py

Then in our CI Process, we have relevant clone jobs: https://gitlab.com/gitlab-data/analytics/-/blob/master/.gitlab-ci.yml#L65

By default, the clone_analytics job runs on every Merge Request. This allows MRs to run on raw data on the analytics clone.

The name of the clone comes from the name of the branch. If the branch is named apple_banana, the database name is APPLE_BANANA. Then instead of outputting to analytics_database.analytics_schema.magical_table, the MR outputs to apple_banana.analytics_schema.magical_table.

Hope that is useful!

1 Like