I have been having a discussion about the static_analysis
config in the dbt Fusion engine on the #dbt-fusion-engine Slack channel but I want to bring the discussion to a wider audience. I will link to the thread but it contains a few different topics so I will summarize the discussion so far. It should be noted that my viewpoint is from a dbt Core only project and is not using any of the additional features of the dbt Platform.
Current State
The dbt Fusion engine attempts to compile and validate the SQL in a project before any SQL is executed, static_analysis:on
. If a model contains an introspective query, meaning it needs to get information about another object via a SQL query, or the schema of the source model is not known then the Fusion engine will, as a default, skip the compile and validation of that model, and all downstream objects, until it is run where it is compiled and executed but not validated, static_analysis:off
. The default can be overridden by setting static_analysis
config to unsafe
witch changes the behavior to delay the compile and SQL check to happen just before a model is executed instead of before any SQL is executed.
This default behavior is designed to ensure that if the SQL is reported an valid that the validation will remain true even if the run is separated in time from the compile. And, if the compilation of a model is independent of any other model or SQL queries then the compilation can be parallelized to increase performance of the compile process.
Concerns with the Current State
The fist concern I have with the current state is that is assumes that the database schema is volatile and changes in an unknowable fashion. In our production environment only dbt is changing the database schema and is orchestrated is a way that minimizes any overlap. And in our development environment our database are isolated at the developer level and can only be modified by the developer, typically using dbt. So automatically marking models with introspective queries as static_analysis:off
makes an evaluation of risk for me with out understanding the environment.
The second concern I have is with the language, unsafe
, used as the default override configuration. Introspective queries have been a regular, if not promoted, part of dbt model development for a long time and suddenly telling developers that they are doing something āwrongā by using them feels heavy handed and contrary to previous assumptions. Common utilities like date_spine
and get_column_values
fall in to the unsafe
category.
My third concern is that automatically tunning off the validation of SQL has an outsized impact on the development process where it would be needed the most and the risk of the run SQL not matching the validated SQL is the least. And the fact that the validation to tunned off silently for all models down stream of a model with an introspective query can be confusion when a SQL error is reported from the database and the developer assumed that it should have been checked before a query was sent.
Proposal
I propose that the language be changed to better describe the functionally of the static_analysis
feature and move away from the ācorrectnessā of the behavior. The labels auto
always
and never
could replace the the current labels of on
unsafe
and off
without changing the functionally but more clearly communicating what would happen to a model with the config.
I do not personally like the default for static_analysis
to be auto
because it is may change the behavior of the models in a way that is not obvious to the developer, but I can also see it as a good starting point in some projects. In our project I am reasonably sure that we will set it up to always
at the project level and then manage exceptions to that on a model by model basis.
Questions for the Community
Iād love to hear from others in the community about their experiences with the Fusion engineās static_analysis
behavior:
- How would the current
static_analysis
configuration affected your workflows? - Do you find the
unsafe
terminology problematic or confusing? - What would be your ideal behavior for handling introspective queries in the compilation process?
- Are there specific use cases where the current behavior would caused issues or benefits?
Looking forward to the discussion and hopefully finding a path forward that better serves the diverse needs of the dbt community.