dbt Fusion and Schema Awareness

Intro

I would like to talk about dbt Fusions new schema awareness, and for context we run a dbt Core only project. When I first read about the schema awareness it felt like a natural extension of the dbt Core manifest.json, or maybe the catalog.json, artifact and it seemed reasonable to me that I should be able to defer to a specific state of the database with the proper configuration. However, this is not the case as the dbt Fusion always attempts to get the current state of the schema in the referenced database, local or deferred, and this has caused problems for me during the testing of Fusion and I expect the problems would exist in normal development.

The Problem

The first problem I ran into was that I do not have access to all of the tables, sources or models, that are defined in the dbt project. This means I get loads of errors for missing schemas when ever I run a dbt command or try to use the VSCode extension. This has the effect of disabling static analysis on models downstream of the error and when the errors are on sources it quickly cascades into significant portions of the project not getting analyzed.

The second problem I have run into is that the production database, which we regularly defer to, is always changing. This presents as errors, such as missing columns, in models that I am not working in my branch. This is ‘noisy’ and dilutes the importance of errors during development even though the errors are ‘correct’. I assume that these kinds of errors can be solved by merging the changes into the branch I am working on, but that is not or standard practice and do not strictly see this as a reason to change how our team works.

Proposed Solution

What I would like is to be able to define a ‘static’ or past database state to work against through configuration so if I have a way to generate that and get it to my local environment I can use it seamlessly, just like how the defer option works now. I feel like this would solve both of the problems I am seeing and generally give me more control over how the project is evaluated.

Are others having these problems? Is there some other way to deal with this that I am missing? What are the thoughts of the broader community?

Related Issues