FAQ: Cleaning up removed models from your production schema

joellabes · August 30, 2021, 7:32am

I just had to do a similar process, with a focus on dropping source tables that we had added to our sync job in the past but weren’t using.

I approached this with a combo of the Jinja graph variable, the information_schema and the venerable spreadsheet.

To build a list of sources we did use, I used the compiled results of this snippet:

{% for node in graph.sources.values() -%}
  select '{{ node.schema }}' as s_name, '{{ node.name }}' as t_name
  {%- if not loop.last %} union all {% endif %}
{% endfor %}

And pasted their results into this query:

with defined_sources as (
    select 'users' as s_name, 'users' as t_name union all
    select 'users' as s_name, 'contactroles' as t_name union all
    ...
    select 'ft_hubspot' as s_name, 'ticket' as t_name
),
     
relevant_tables as (
    select 
        table_schema as s_name, 
        table_name as t_name
    from information_schema.tables
    where table_schema not ilike 'dev_%' --our dbt dev schemas
    and table_schema not ilike 'analytics%' --our output schemas
    and table_schema not in ('pg_catalog', 'information_schema', 'admin', 'public', 'census', 'inbound__timesheets_and_quality_control') --Redshift internal tables and tables built by using get_relations_by_pattern instead of being defined
)

select * from relevant_tables
except
select * from defined_sources

This gave me all the tables in the warehouse that weren’t defined in dbt, so we weren’t using and could consider dropping to save disk usage and reduce load on the source databases.

To focus on the biggest tables first, I wanted to combine it with the metadata from SVV_TABLE_INFO:

select
   schema as s_name,
   "table" as t_name,
   size as size_mb,
   tbl_rows as total_rows
from SVV_TABLE_INFO

However, I found that information_schema queries always run against the leader node which meant I couldn’t join the results of this query with my first query. Instead, I dumped the results to a spreadsheet and worked it out there.

Topic		Replies	Views
Clean your warehouse of old and deprecated models Show and Tell	15	29710	November 27, 2023
DBT drop command in azure sql database Help	1	3642	May 19, 2023
How to clean orphaned schema (Snowflake specific) Help	3	1670	December 12, 2022
Handling Clean Up of dbt_tmp Tables. Show and Tell incremental , best-practice , bigquery	0	2412	September 29, 2023
Promote stage schema after testing Archive	0	2136	May 24, 2020

FAQ: Cleaning up removed models from your production schema

Related topics