If you remove a model from your dbt project, dbt does not automatically drop the relation (table/view) from your schema. This means that users can end up with extra objects in their production schema, which can reduce the reliability of your prod data.
Option 1: Periodically drop and rebuild the entire schema.
dbt is built on the assumption that everything can be rebuilt at any given time, so this is a really easy way to remove objects.
However, it’s not for the faint of heart, and may cause some downtime
Option 2: Query the information schema to find extra objects in prod.
I have the following query in my
analysis directory. When run against my database, it finds objects (tables, views, functions) that exist in a prod schema but does not exist in the related dev schema.
Note that this works for me because I routinely drop my dev database so don’t have any extra objects in it. This query has been tested against Redshift and Postgres.
-- set pairs of schemas up as a cte for maximum flexibility with dbt_schemas as ( select 'analytics' as prod_schema, 'analytics_claire' as dev_schema union all select 'foo' as prod_schema, 'foo_claire' as dev_schema ) , objects as ( select lower(coalesce(nullif(table_type, 'BASE TABLE'), 'table')) as obj_type, table_schema as obj_schema, table_name as obj_name from information_schema.tables union all select 'function' as obj_type, specific_schema as obj_schema, routine_name as obj_name from information_schema.routines where routine_type = 'FUNCTION' ) , prod_objects as ( select objects.* , dbt_schemas.dev_schema as expected_dev_schema from objects inner join dbt_schemas on dbt_schemas.prod_schema = objects.obj_schema ) , prod_only_objects as ( select prod_objects.* from prod_objects left join objects on objects.obj_schema = prod_objects.expected_dev_schema and objects.obj_name = prod_objects.obj_name and objects.obj_type = prod_objects.obj_type where objects.obj_name is null ) select '-- drop ' || obj_type || ' if exists "' || obj_schema || '"."' || obj_name || '" cascade;' as drop_statement from prod_only_objects
The query returns a set of sql statements that can then be copied to a sql console and executed to drop the extra objects.
Ideally I’d return the first part of the query as a dbt statement and output the drop statements themselves to a sql file in
compiled/analysis/. However, analyses are built when you execute
dbt compile, and dbt statements are only executed on
run (rather than
compile). As such, it’s not possible to return the results of the dbt statement to your Jinja context.
Option 3: ???
There’s probably also a third way to do this which would involve iterating through your dbt project to find the expected objects, and comparing it to the state of your prod schemas. Let me know if anyone investigates this option!