Change Database Name when using --defer and --state

Hi community,

I think the error I am having below maybe basic but I tried a few things that haven’t worked, so I would like some insights from you.

The problem I’m having

I have two databases: BUILD, which is the dev environment, and PROD, which is the production. I want to user --defer for developing my new models and also for running the CI.
Two problems:

  1. The dbt compile does not work when I use --defer and --state arguments. It is always referring to the dev database and schema.
  2. Even when I use dbt run with --defer and --state, the model I am referencing it gets the correct schema that is from PROD. However, the database is wrong. Is there a way to fix it?

The context of why I’m trying to do this

I am using this in the dbt Cloud. As I said, I have BUILD and PROD databases, and to save some time and costs when I am building a new model referencing a model that already exists, I would like to read it from PROD.

What I’ve already tried

I have already tried a few things in the generate database name macro and also in the ref but with no success.

I really appreciate the help!

Hi there,

I ended up doing the below change. I’d appreciate any thoughts on this and comments on this. Is there a better way of doing it.

{% macro ref(model_name) %}
    {% set relation = builtins.ref(model_name).include(database=false) %}
    {% if target.name == 'dev' and target.schema not in relation %}
          {% do return(var('prod_db') ~ '.' ~ relation) %}
    {% else %}
          {% do return( target.database ~ '.' ~ relation) %}
    {% endif %}
{% endmacro %}

I added more details here:

Goal

The goal is to build a slim Development (dev) and Continuous Integration (CI) environment that runs in different databases.

The challenge

Currently, dbt allows us to configure a sandbox environment using the --defer and --state arguments when compiling, running or testing models. The article Defer | dbt Developer Hub explains what Defer is, its benefits and its challenges.

So, let’s dive into a scenario:

Production database has one schema called to transform, one source table source_a and three models, model_a, model_b and model_c, as demonstrated below.

Their SQL definition is:

-- Model A -- model_a.sql
Select * from {{ source('source_a') }}

-- Model B -- model_b.sql
Select * from {{ ref('model_a') }}

-- Model C -- model_c.sql
Select * from {{ ref('model_b') }}


-- Compiled results

-- Model A -- model_a.sql
Select * from prod.transform.source_a

-- Model B -- model_b.sql
Select * from prod.transform.model_a

-- Model C -- model_c.sql
Select * from prod.transform.model_b

Let’s suppose that we want to make a change in the model_b. However, the model_a is a huge table that takes a long time to rebuild and has computing and storage costs. So, we want to consume the model_a from the production that makes our changes only on model_b.

dbt allows us to do that using the --defer and --state. So, I make my changes in model_b and run the command that looks like this:

dbt run --select model_b --defer --state ./path_to_last_prod_manifest_artifact.

The compiled code of model_b would be something like this:

-- Modified and compiled Model B -- model_b.sql
create view prod.dbt_rafael_transform.model_b as
Select * from prod.transform.model_a

So, model_b would be created in the production database but in the dev_rafael schema. Now, we have something link this:

The best benefit of this is we can run tests and execute our code in a slim way saving costs and improving the velocity of our development. This process is also used for the CI, which means CI will only build the changed models.

However, I see only one issue with this solution, imagine now that we have three or four people working on changes, and it means we’ll have three or four more new schemas in the databases. The schemas that are going to be processed and modified alongside the production schema which is not safe, as everyone only should have read access to the production database.

So the expected behaviour I would like to see when using --defer and --state arguments are as below:

So, when I run dbt run --select model_b --defer --state ./path_to_last_prod_manifest_artifact, the compiled code would be something like:

-- Modified and compiled Model B -- model_b.sql
create view dev.dbt_rafael_transform.model_b as
Select * from prod.transform.model_a

So now, the database and schemas are dev keeping the production database clean and secure.

The question is: how can I achieve it?

Do I need to change the ref or generate_database_name macros? Should it be handled by --defer and state? Is this a bug in dbt?

I generated a new manifest file from my production environment, which fixed the issue. Somehow the production database and the build/dev database were mixed in the manifest, which may be due to the tests. I fixed it, and now the --defer with --state works fine.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.