FAQ: Cleaning up removed models from your production schema


#1

If you remove a model from your dbt project, dbt does not automatically drop the relation (table/view) from your schema. This means that users can end up with extra objects in their production schema, which can reduce the reliability of your prod data.

As a result, we’ve seen a few questions about how to clean up these objects in your production schema (here, here, and here)

Option 1: Periodically drop and rebuild the entire schema.
dbt is built on the assumption that everything can be rebuilt at any given time, so this is a really easy way to remove objects.
However, it’s not for the faint of heart, and may cause some downtime

Option 2: Query the information schema to find extra objects in prod.
I have the following query in my analysis directory. When run against my database, it finds objects (tables, views, functions) that exist in a prod schema but does not exist in the related dev schema.
Note that this works for me because I routinely drop my dev database so don’t have any extra objects in it. This query has been tested against Redshift and Postgres.

-- set pairs of schemas up as a cte for maximum flexibility
with dbt_schemas as (
  select
  'analytics' as prod_schema,
  'analytics_claire' as dev_schema
  union all
  select
  'foo' as prod_schema,
  'foo_claire' as dev_schema
)

, objects as (
  select
  lower(coalesce(nullif(table_type, 'BASE TABLE'), 'table')) as obj_type,
  table_schema as obj_schema,
  table_name as obj_name
  from information_schema.tables

  union all

  select
  'function' as obj_type,
  specific_schema as obj_schema,
  routine_name as obj_name
  from information_schema.routines
  where routine_type = 'FUNCTION'
)
, prod_objects as (
  select
  objects.*
  , dbt_schemas.dev_schema as expected_dev_schema
  from objects
  inner join dbt_schemas on dbt_schemas.prod_schema = objects.obj_schema
)

, prod_only_objects as (
  select prod_objects.*
  from prod_objects
  left join objects on objects.obj_schema = prod_objects.expected_dev_schema
    and objects.obj_name = prod_objects.obj_name
    and objects.obj_type = prod_objects.obj_type
  where objects.obj_name is null
)
select
  '-- drop ' ||  obj_type || ' if exists "' || obj_schema || '"."' || obj_name || '" cascade;' as drop_statement
from prod_only_objects

The query returns a set of sql statements that can then be copied to a sql console and executed to drop the extra objects.

Ideally I’d return the first part of the query as a dbt statement and output the drop statements themselves to a sql file in compiled/analysis/. However, analyses are built when you execute dbt compile, and dbt statements are only executed on run (rather than compile). As such, it’s not possible to return the results of the dbt statement to your Jinja context.

Option 3: ???
There’s probably also a third way to do this which would involve iterating through your dbt project to find the expected objects, and comparing it to the state of your prod schemas. Let me know if anyone investigates this option!