dbt models persist in your production database even after they’re deleted from your project. This adds clutter to the warehouse and can slow down operations like database or schema cloning, which are used frequently in CI/CD for snowflake dbt projects.
I wrote a macro that executes DROP statements for all the tables and views in your target schema that don’t exist in your project. How does it work? First, it fetches all the models, seeds, and snapshots in your project. Then, it queries the information schema for all the models in your target schema excluding the ones fetched above. Finally, it formats (and then executes) DROP statements for each table/view that isn’t in your project.
To invoke from the command line, use: dbt run-operation drop_old_relations
Warning: This will delete everything in your target schema that isn’t in your dbt project.
Warning: This does not cover aliases
I strongly recommend running dbt run-operation drop_old_relations --args '{"dryrun": True}'
first to see the DROP commands
Thanks a lot to Randy of Hashmap for providing much of the code! You can find his original version here, which drops models based on the last_altered date in the info schema using a pre-set cutoff
{% macro drop_old_relations(dryrun=False) %}
{% if execute %}
{% set current_models=[] %}
{% for node in graph.nodes.values()
| selectattr("resource_type", "in", ["model", "seed", "snapshot"])%}
{% do current_models.append(node.name) %}
{% endfor %}
{% endif %}
{% set cleanup_query %}
WITH MODELS_TO_DROP AS (
SELECT
CASE
WHEN TABLE_TYPE = 'BASE TABLE' THEN 'TABLE'
WHEN TABLE_TYPE = 'VIEW' THEN 'VIEW'
END AS RELATION_TYPE,
CONCAT_WS('.', TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME) AS RELATION_NAME
FROM
{{ target.database }}.INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = '{{ target.schema }}'
AND TABLE_NAME NOT IN
({%- for model in current_models -%}
'{{ model.upper() }}'
{%- if not loop.last -%}
,
{% endif %}
{%- endfor -%}))
SELECT
'DROP ' || RELATION_TYPE || ' ' || RELATION_NAME || ';' as DROP_COMMANDS
FROM
MODELS_TO_DROP
{% endset %}
{% do log(cleanup_query, info=True) %}
{% set drop_commands = run_query(cleanup_query).columns[0].values() %}
{% if drop_commands %}
{% if dryrun | as_bool == False %}
{% do log('Executing DROP commands...', True) %}
{% else %}
{% do log('Printing DROP commands...', True) %}
{% endif %}
{% for drop_command in drop_commands %}
{% do log(drop_command, True) %}
{% if dryrun | as_bool == False %}
{% do run_query(drop_command) %}
{% endif %}
{% endfor %}
{% else %}
{% do log('No relations to clean.', True) %}
{% endif %}
{%- endmacro -%}
EDIT: Added dryrun feature