Here’s the scenario - you created a PR - a CI job kicks off in dbt Cloud and then you see your “modified” model (bar
) error as it tries to select from another model (foo
) (foo
is ref’d in bar
) in a database or schema that you did not expect. The error looks something like this:
09:14:18 Database Error in model bar (models/bar.sql)
002003 (42S02): SQL compilation error:
Object 'DEVELOPMENT_JYEO.FAKE_SCHEMA.FOO' does not exist or not authorized.
compiled Code at target/run/models/bar.sql
Here - the schema fake_schema
is not something that actually exist in our database let alone model foo
being built into it.
Let’s see how something like this can come to be.
dbt project setup
-- models/foo.sql
select 1 as id
-- models/bar.sql
select * from {{ ref('foo') }}
# dbt_project.yml
name: my_dbt_project
config-version: 2
version: "1.0.0"
models:
my_dbt_project:
+materialized: table
+schema: marts
-- macros/gsn.sql
{% macro generate_schema_name(custom_schema_name, node) -%}
{%- set default_schema = target.schema -%}
{%- if target.name == 'prod' -%}
{{ custom_schema_name }}
{%- else -%}
{{ default_schema }}
{%- endif -%}
{%- endmacro %}
dbt Cloud job setup
We’re going to create a dbt Cloud environment where we will be creating our dbt Cloud jobs:
Note the above “Schema” config is
fake_schema
- some users put a schema that they will never actually use here. The reason for that is they are confident that theirgenerate_schema_name()
logic ++schema
configs will resolve schemas correctly for them.
Now we will be creating 3 jobs:
- The “Daily Production” job is just
dbt build
but importantly the target name is set toprod
:
Your normal prod job run.
- The “Compile Only” job is just
dbt compile
but we’re leaving the target name as is - i.e.default
:
Some users have these “compile” only jobs or “docs generate” only jobs which don’t actually materialize models but are there for some specific function.
- The “CI Job” job is a stock standard
dbt build --select state:modified+
but it’s target is also to be left alone asdefault
.
All these target names will affect how the schemas of our models are resolved.
Let’s see what happens when our Daily Production job (1) runs
Our models are built into the marts
schema - as expected.
-- macros/gsn.sql
{% macro generate_schema_name(custom_schema_name, node) -%}
{%- set default_schema = target.schema -%}
{%- if target.name == 'prod' -%} <- Recall the target name of this job fulfils this criteria.
{{ custom_schema_name }} <- "marts" due to +schema config.
{%- else -%}
{{ default_schema }}
{%- endif -%}
{%- endmacro %}
Let’s see what happens when our Compile Only job (2) runs
Essentially nothing happens here - however the “magic” sauce is actually what is contained within the manifest.json
file that was created out of this run. If we download the manifest.json
(using the “Artifacts” tab) and go through it - we would actually see something like:
"nodes": {
"model.my_dbt_project.foo": {
"database": "development_jyeo",
"schema": "fake_schema",
"name": "foo",
...
},
"model.my_dbt_project.bar": {
"database": "development_jyeo",
"schema": "fake_schema",
"name": "bar",
...
The takeaways for this run is:
- A manifest / state has been generated for this run.
- The schemas for
foo
andbar
were resolved tofake_schema
due to ourgenerate_schema_name
logic.
-- macros/gsn.sql
{% macro generate_schema_name(custom_schema_name, node) -%}
{%- set default_schema = target.schema -%} <- Recall here we set it to `fake_schema` in the environment.
{%- if target.name == 'prod' -%}
{{ custom_schema_name }}
{%- else -%} <- Recall the target name of this job fulfils this criteria since the target name is 'default' and not 'prod'.
{{ default_schema }} <- "fake_schema"
{%- endif -%}
{%- endmacro %}
NO actual models were being built into fake_schema
- it’s just that the “state” of the project (which is stored in the manifest.json
file) had resolved those models to have the schema fake_schema
.
Let’s see what happens when our CI Job (3) runs
- Make a small change to our
bar
model so that we can trigger a CI job run that builds that model.
-- models/bar.sql
select id as not_id from {{ ref('foo') }}
- Create the PR:
Notice that a CI job kicks off but with a failure:
The error message of which is exactly as stated in the opening of this thread. We can inspect closer into the debug logs:
create or replace transient table development_jyeo.dbt_cloud_pr_524058_17.bar
as
(-- models/bar.sql
select id as not_id from development_jyeo.fake_schema.foo
);
^ Model bar
was being rebuilt in the CI run (cause it was the one modified no surprises here) but it was selecting from development_jyeo.fake_schema.foo
which does not actually exist.
Now the astute readers will already kinda know what’s going on here (since I described the setting of the project state in the run of the job “Compile Only” just prior to this CI job run) but let’s see exactly whats happening here.
In our CI job run, model foo
itself was not selected to be rebuilt. So when dbt goes to resolve the schema of ref('foo')
- it takes it from the project state that it deferred to.
This is all documented here: Defer | dbt Developer Hub
When the
--defer
flag is provided, dbt will resolveref
calls differently depending on two criteria:
- Is the referenced node included in the model selection criteria of the current run?
- Does the reference node exist as a database object in the current environment?
If the answer to both is no—a node is not included and it does not exist as a database object in the current environment—references to it will use the other namespace instead, provided by the state manifest.
And here is a loom where I show the above: