How do CI jobs resolve ref's for models that are not selected to be run in the CI job itself?

Here’s the scenario - you created a PR - a CI job kicks off in dbt Cloud and then you see your “modified” model (bar) error as it tries to select from another model (foo) (foo is ref’d in bar) in a database or schema that you did not expect. The error looks something like this:

09:14:18    Database Error in model bar (models/bar.sql)
  002003 (42S02): SQL compilation error:
  Object 'DEVELOPMENT_JYEO.FAKE_SCHEMA.FOO' does not exist or not authorized.
  compiled Code at target/run/models/bar.sql

Here - the schema fake_schema is not something that actually exist in our database let alone model foo being built into it.

Let’s see how something like this can come to be.

dbt project setup

-- models/foo.sql
select 1 as id

-- models/bar.sql
select * from {{ ref('foo') }}
# dbt_project.yml
name: my_dbt_project
config-version: 2
version: "1.0.0"

models:
  my_dbt_project:
    +materialized: table
    +schema: marts
-- macros/gsn.sql
{% macro generate_schema_name(custom_schema_name, node) -%}
    {%- set default_schema = target.schema -%}
    {%- if target.name == 'prod' -%}
        {{ custom_schema_name }}
    {%- else -%}
        {{ default_schema }}
    {%- endif -%}
{%- endmacro %}

dbt Cloud job setup

We’re going to create a dbt Cloud environment where we will be creating our dbt Cloud jobs:

Note the above “Schema” config is fake_schema - some users put a schema that they will never actually use here. The reason for that is they are confident that their generate_schema_name() logic + +schema configs will resolve schemas correctly for them.


Now we will be creating 3 jobs:

  1. The “Daily Production” job is just dbt build but importantly the target name is set to prod:
    image

Your normal prod job run.

  1. The “Compile Only” job is just dbt compile but we’re leaving the target name as is - i.e. default:
    image

Some users have these “compile” only jobs or “docs generate” only jobs which don’t actually materialize models but are there for some specific function.

  1. The “CI Job” job is a stock standard dbt build --select state:modified+ but it’s target is also to be left alone as default.

All these target names will affect how the schemas of our models are resolved.

Let’s see what happens when our Daily Production job (1) runs

Our models are built into the marts schema - as expected.

-- macros/gsn.sql
{% macro generate_schema_name(custom_schema_name, node) -%}
    {%- set default_schema = target.schema -%}
    {%- if target.name == 'prod' -%} <- Recall the target name of this job fulfils this criteria.
        {{ custom_schema_name }}     <- "marts" due to +schema config.
    {%- else -%}
        {{ default_schema }}
    {%- endif -%}
{%- endmacro %}

Let’s see what happens when our Compile Only job (2) runs

Essentially nothing happens here - however the “magic” sauce is actually what is contained within the manifest.json file that was created out of this run. If we download the manifest.json (using the “Artifacts” tab) and go through it - we would actually see something like:

  "nodes": {
    "model.my_dbt_project.foo": {
      "database": "development_jyeo",
      "schema": "fake_schema",
      "name": "foo",
      ...
    },
    "model.my_dbt_project.bar": {
      "database": "development_jyeo",
      "schema": "fake_schema",
      "name": "bar",
      ...

The takeaways for this run is:

  1. A manifest / state has been generated for this run.
  2. The schemas for foo and bar were resolved to fake_schema due to our generate_schema_name logic.
-- macros/gsn.sql
{% macro generate_schema_name(custom_schema_name, node) -%}
    {%- set default_schema = target.schema -%} <- Recall here we set it to `fake_schema` in the environment.
    {%- if target.name == 'prod' -%}
        {{ custom_schema_name }}     
    {%- else -%} <- Recall the target name of this job fulfils this criteria since the target name is 'default' and not 'prod'.
        {{ default_schema }} <- "fake_schema"
    {%- endif -%}
{%- endmacro %}

NO actual models were being built into fake_schema - it’s just that the “state” of the project (which is stored in the manifest.json file) had resolved those models to have the schema fake_schema.

Let’s see what happens when our CI Job (3) runs

  1. Make a small change to our bar model so that we can trigger a CI job run that builds that model.
-- models/bar.sql
select id as not_id from {{ ref('foo') }}
  1. Create the PR:

Notice that a CI job kicks off but with a failure:

The error message of which is exactly as stated in the opening of this thread. We can inspect closer into the debug logs:

create or replace transient table development_jyeo.dbt_cloud_pr_524058_17.bar
         as
        (-- models/bar.sql
select id as not_id from development_jyeo.fake_schema.foo
        );

^ Model bar was being rebuilt in the CI run (cause it was the one modified no surprises here) but it was selecting from development_jyeo.fake_schema.foo which does not actually exist.

Now the astute readers will already kinda know what’s going on here (since I described the setting of the project state in the run of the job “Compile Only” just prior to this CI job run) but let’s see exactly whats happening here.

In our CI job run, model foo itself was not selected to be rebuilt. So when dbt goes to resolve the schema of ref('foo') - it takes it from the project state that it deferred to.

This is all documented here: Defer | dbt Developer Hub

When the --defer flag is provided, dbt will resolve ref calls differently depending on two criteria:

  1. Is the referenced node included in the model selection criteria of the current run?
  2. Does the reference node exist as a database object in the current environment?

If the answer to both is no—a node is not included and it does not exist as a database object in the current environment—references to it will use the other namespace instead, provided by the state manifest.

And here is a loom where I show the above: