The result of adapters.get_columns_in_relation seems to include only columns that exist in the current version of the table. This is fine except when the relation is allowed to change columns, and the point where get_columns_in_relation is called is downstream of the relation that has changed.
When I actually run the process, it works fine; however, on compilation, the downstream model gets the wrong column list containing only the columns that are currently in the model in the database, not what is in the model after the upstream models have run. I would like a version of the macro get_columns_in_relation that understands when upstream models have changed.
I am using the snowflake adapter.
The context of why I’m trying to do this
I have a model that is expected to change schema (in particular, it will add columns). I have models downstream of this that use get_columns_in_relation to understand what columns are in the schema. They use this to select which columns to query.
This process works when the project runs, but do not have the desired columns on a compile. The result of compile only produces the columns in the model that exist on the current table.
What I’ve already tried
I tried to use get_columns_in_relation, but it only returned the columns in the current state of the relation. I am not aware of any other macro to do the same that is aware of schema changes.
When you say that it works “when you actually run the process”, do you mean once you have merged the changes to main and deployed the changes to production?
but do not have the desired columns on a compile
when are you only running a compile? Is this when you are just editing files in your dev environment?
You will need to run the upstream model in your dev environment before get_columns_in_relation will work. It doesn’t look at your dbt project’s code, it looks at the current state of the table in the database.
Yes, when the entire pipeline runs, it would create the columns that were missing in the upstream tables, and then the result from get_columns_in_relation would match what is expected from the models.
In our development process, we want to run compile and then do some checking on the compiled SQL to make sure there are no breaking changes in the pipeline before it runs. In particular, a breaking change can include referencing a column that does not exist in an upstream model. We want these checks to run on production.
In a few instances, our checks have failed because the models using get_column_in_relation to understand what columns are in the upstream tables fails because it does not actually include the columns in those models. I would like to change that behavior, so that it will check using the columns defined in the model rather than what is in the database at this moment.
I’m with you right up until “We want these checks to run on production”. Right now it sounds like you are manually verifying your queries which is why you’re getting unexpected behaviours.
You should be running these validations inside of a separate CI environment, which will build modified nodes in a short-lived schema that is dropped when your PR is merged. (Unmodified nodes will still defer to production - this is called “Slim CI”)