How to prevent views from constantly breaking due to underlying table changes

The problem I’m having

We populate our raw database using Fivetran. Since dbt automatically hardcodes the columns in a view definition, our staging views break every time a new column is added to the underlying table, with, for example:
View definition for 'ANALYTICS_DB.STG_SCHEMA.STG_VIEW' declared 16 column(s), but view query produces 19 column(s).

The context of why I’m trying to do this

I’m trying to figure out a way to not have to re-run all staging views every time Fivetran finishes syncing to prevent this.

What I’ve already tried

I’ve been looking at dbt-coves and dbt-osmosis. I think I can get things to work ok with a decent amount of modification and combination, but a) it seems like way more work than should be required for something like this, b) the performance is terrible, etc.

Some example code or error messages

Model ANALYTICS_DB.STG_SCHEMA.STG_VIEW is defined as

SELECT
	*

FROM
	FIVETRAN_DB.SCHEMA.TABLE

But gets compiled to

CREATE OR REPLACE VIEW ANALYTICS_DB.STG_SCHEMA.STG_VIEW AS
(
	SELECT
		COL1
	,	COL2
	,	COL3

	FROM
		FIVETRAN_DB.SCHEMA.TABLE
);

What adapter are you using? I ask because when I look in target/run for both my test (duckdb) project I don’t see it enumerate the columns like your example shows.

Thanks @jaypeedevlin for the quick response!

I’m using Snowflake. I just assumed the behavior would be consistent across adapters.

I had a play with Snowflake, and this is not actually dbt doing this, but instead how Snowflake stores the resultant created view. I did the same thing as above and the code in /target/run includes the select * but I pointed it to a test table, which I then added a column to and got the error you’re seeing.

What dbt ran:

  create or replace  view db.schema.test
  
   as (
    

select
  *
from db.scratch.jd_table
  );

What snowflake stores as the definition of the view:

create or replace view DB.SCHEMA.TEST(
	FOO,
	BAR
) as (
    

select
  *
from db.scratch.jd_table
  );

So when the underlying table adds a column the view fails.

You can automate adding the columns in using dbt_utils.star() which will mean your views don’t fail when columns are added, and also will update without you having to touch the code on your next dbt run.

All of that said, my subjective opinion is that staging models are for renaming, casting types, etc. Essentially creating an endorsed view of your source data. With that framing, I prefer to manually enumerate the columns (using codegen to help initially). Otherwise, you may as well not have staging models at all and just select directly from the source.

1 Like

That was awesome of you to dive in there. I should’ve looked at the compiled code myself, instead of just assuming that it was dbt doing this rather than Snowflake.

You make a great point at the end, and I agree.

Thanks again!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.