dbt Fusion and `static_analysis`

I have been having a discussion about the static_analysis config in the dbt Fusion engine on the #dbt-fusion-engine Slack channel but I want to bring the discussion to a wider audience. I will link to the thread but it contains a few different topics so I will summarize the discussion so far. It should be noted that my viewpoint is from a dbt Core only project and is not using any of the additional features of the dbt Platform.

Slack Thread

Current State

The dbt Fusion engine attempts to compile and validate the SQL in a project before any SQL is executed, static_analysis:on. If a model contains an introspective query, meaning it needs to get information about another object via a SQL query, or the schema of the source model is not known then the Fusion engine will, as a default, skip the compile and validation of that model, and all downstream objects, until it is run where it is compiled and executed but not validated, static_analysis:off. The default can be overridden by setting static_analysis config to unsafe witch changes the behavior to delay the compile and SQL check to happen just before a model is executed instead of before any SQL is executed.

This default behavior is designed to ensure that if the SQL is reported an valid that the validation will remain true even if the run is separated in time from the compile. And, if the compilation of a model is independent of any other model or SQL queries then the compilation can be parallelized to increase performance of the compile process.

Concerns with the Current State

The fist concern I have with the current state is that is assumes that the database schema is volatile and changes in an unknowable fashion. In our production environment only dbt is changing the database schema and is orchestrated is a way that minimizes any overlap. And in our development environment our database are isolated at the developer level and can only be modified by the developer, typically using dbt. So automatically marking models with introspective queries as static_analysis:off makes an evaluation of risk for me with out understanding the environment.

The second concern I have is with the language, unsafe, used as the default override configuration. Introspective queries have been a regular, if not promoted, part of dbt model development for a long time and suddenly telling developers that they are doing something ā€˜wrong’ by using them feels heavy handed and contrary to previous assumptions. Common utilities like date_spine and get_column_values fall in to the unsafe category.

My third concern is that automatically tunning off the validation of SQL has an outsized impact on the development process where it would be needed the most and the risk of the run SQL not matching the validated SQL is the least. And the fact that the validation to tunned off silently for all models down stream of a model with an introspective query can be confusion when a SQL error is reported from the database and the developer assumed that it should have been checked before a query was sent.

Proposal

I propose that the language be changed to better describe the functionally of the static_analysis feature and move away from the ā€˜correctness’ of the behavior. The labels auto always and never could replace the the current labels of on unsafe and off without changing the functionally but more clearly communicating what would happen to a model with the config.

I do not personally like the default for static_analysis to be auto because it is may change the behavior of the models in a way that is not obvious to the developer, but I can also see it as a good starting point in some projects. In our project I am reasonably sure that we will set it up to always at the project level and then manage exceptions to that on a model by model basis.

Questions for the Community

I’d love to hear from others in the community about their experiences with the Fusion engine’s static_analysis behavior:

  • How would the current static_analysis configuration affected your workflows?
  • Do you find the unsafe terminology problematic or confusing?
  • What would be your ideal behavior for handling introspective queries in the compilation process?
  • Are there specific use cases where the current behavior would caused issues or benefits?

Looking forward to the discussion and hopefully finding a path forward that better serves the diverse needs of the dbt community.

1 Like

Cross posting my 2 cents from (Slack) for visibility.

I think a big issue here is:

  • Traditionally dbt evangelized how jinja, macros, and dbt packages (like dbt_utils ) were great tools to save developers time by accomplishing a lot with few characters. Even if the complexity went up and readability went down, it was a worthwhile trade.
  • dbtf has the same intention of saving developers time, but it is in complete contrast with the established and encouraged dbt patterns of jinja, macros, and packages should be embrace. dbtf seems to heavily discourage their use.

FWIW - I much prefer the labels auto always and never that @pempey suggests.

1 Like

Great points here, thank you both - I have been chewing on this for a couple of weeks but figured I should get some quick notes down!

A couple of high level points:

  • I don’t think we’ve done a good enough job of explaining the difference between Just In Time compilation[1] vs Ahead Of Time compilation[2] and how they interact with static analysis[3]. I have a note to self to flesh this out. In both JIT and AOT compilation, the models are still statically analysed (as long as it hasn’t been turned off) - it’s just a function of when that validation happens.

  • I’m somewhat sympathetic to the argument that unsafe/always mode should be the default. But given that it only takes one line of config in your dbt_project.yml to change the default, I don’t have super strong feelings about this.

  • I’m extremely sympathetic to the idea that there are common macros which currently require introspection, and that it feels icky. My thesis is that a not-insignificant number of utils-style queries may not need to be introspective in the long term. To address a couple:

    • dbt.date_spine has no business needing an introspective query; that’s just a hangover from the late 2010s. We’re gonna rustify that so it’s static
    • dbt_utils.star is redundant for most warehouses at this point - we should consider returning native select * except(...) where possible. Even when the target warehouse doesn’t support it, the dbt Fusion engine now knows the column names and types of models after its initial caching step, so it should be able to be done statically.

Quoting Alex from an internal thread:

if you do a lot of introspection, your SQL is not guaranteed to be idempotent upon execution especially if the introspection happens across dependencies (i.e. not upstream) or against external tables / sources. Turning static analysis off is our way of saying, ā€œwe can’t guarantee this,ā€ and marking it as unsafe is your way of saying ā€œI know, give it to me anywayā€.

I don’t know whether giving a :v: dangerous :v: config a less-scary name does anything useful - it is kinda a feature that you get prompted to think ā€œdo I really need thisā€? For example, last year I worked on a model which did a ton of pivoting. I used get_column_values liberally, but looking back, I actually could have defined them in a Jinja dict and it would have been clearer and more resilient (I had a downstream Looker model which depended on those exact columns existing, and no others).

Appreciate you both taking the time to write this up and keen to hear what others think!


  1. dbt Core’s only option, how the Fusion engine handles introspective queries ā†©ļøŽ

  2. The Fusion engine’s default behaviour ā†©ļøŽ

  3. Checking that the rendered code is syntactically valid ā†©ļøŽ