Using macro {{this}} and {{selected_resources}} fail because one is empty at compilation and the other at execution

ludovic.gayet · January 23, 2023, 10:16am

The problem I’m having

I want to use different databases in my model creation. I want my model to be created in my dev database but use the data from production environnement. I can produce this behaviour (i’m on bigquery) but i fail at automatizing it.
To do that, i try to leverage the {{this}} and {{selected_resources}} jinja macro, but it fail because one is empty at compilation and the other at execution.

wefo · January 23, 2023, 1:19pm

Hi @ludovic.gayet

What do you want to achieve building the model in the dev environment using the production model?

If you want to test your incremental model, you can build it from scratch in dev and rerun the same model, testing its incremental behavior.

It could be handy to limit dev data (Best practices | dbt Developer Hub)

On the other hand, if you need to create up-stream models in dev, you could avoid it using the defer flag (Defer | dbt Developer Hub). Missing up-stream models will be referred from prod env.

ludovic.gayet · January 23, 2023, 2:10pm

Hum i don’t think this is what i want, i will try to re-explain. @wefo

I want all my selected models to run on some database and all the other models referenced in my selected model to run on another database. For example

Model A :
Select * FROM {{ref(‘B’)}}

dbt run --select model_A

Would give :
CREATE TABLE database1.table_A
AS
SELECT * FROM database2.table_B

And database1 and database2 are 2 different GCP project

wefo · January 23, 2023, 4:11pm

In this case, you should specify a database in the configuration. You should point out that modelA should be created in project1 and modelB in project2. dbt will handle everything else.

ludovic.gayet · January 24, 2023, 11:19am

I think it’s a bit more complicated than that. @wefo

I want some logic for which database to choose based on the fact that a model is selected in the “dbt run”.
So if i do :
dbt run modelA => modelA is created in project1.
CREATE TABLE database1.table_A
AS
SELECT * FROM database2.table_B

dbt run modelB => modelB is created in project1.
CREATE TABLE database1.table_B
AS
SELECT * FROM database2.table_A

As we can see the “database” in configuration is not fixed, it depends on if the model is part of the {{selected_models}}

wefo · January 24, 2023, 1:23pm

I’m really confused now

Based on your initial post, you want to run some models in the development environment but avoid recreating required models and instead use production data.

In this case, use the defer flag (Defer | dbt Developer Hub) and the state flag pointing to the production manifest.json.

Creating model_A dbt will check if model_B exists in the development environment and refer to it, if not then dbt will take the model_B from production.

P.S. model_A and model_B cannot refer to each other because you will get a loop in your DAG and the dbt run will fail.

ludovic.gayet · January 24, 2023, 2:08pm

Lol i can see that because it’s still not what i want. I will give the code that doesn’t work properly, maybe it will work clarify:
this is my model configuration:
{{config(
materialized = ‘table’,
schema=‘mart’,
database=get_database(this)
)}}

and this is my macro :
{% macro get_mart_database(model) %}

{{print(this)}}
{{print(selected_resources)}}
{%- if this.identifier in selected_resources -%}

    {%- do log(this.identifier ~ " is included based on the current selection", info=true) -%}
    databaseA
{%- else -%}

    {%- do log(this.identifier ~ " is not included based on the current selection", info=true) -%}
    databaseB
{%- endif -%}

{% endmacro %}

This code doesn’t work because one is empty at compilation and the other at execution.
Any idea how i can make it work? @wefo

wefo · January 24, 2023, 7:38pm

Ok, now I see what kind of technical problem you are facing. Could you also provide more context why you are doing it and what result you want to achieve?

ludovic.gayet · January 25, 2023, 10:18am

Nice! @wefo

Why i want to do it?
Some team members need to create dataset, to do so we allow them to contribute to our dbt project.
They perform all of their test in dev and once their happy with their model they open a Pull Request to our repo. They have read_only right on my production environment and create right on my development environment.
So they build model in development but the data consistency in dev is not always perfect. And as they build complex model, for an easier feedback loop we allow them to build dev table based on production data.
That’s why i need a way to have my model created in dev but all the {{ref}} inside my model to refer to my production env. I thought the way of doing it was the macro i wrote but as you can see it’s not working.

wefo · January 25, 2023, 5:25pm

I don’t want to write an entire article about dev and prod, but all my post and hits were guiding towards the solution. Let’s go step by step because I have implemented it.

Here is an example of lineage

You should have a compiled manifest.json file. You could place it in an accessible place for your developers.
You give BigQuery DataEditor and Bigquery JobUser permissions for your developers in a development project and BigQuery DataViewer permission in a production project.
The developers have to set up their profiles.yml file configured. project key should refer to a development project, dataset key to the developer dataset e.g., dev_alice (https://docs.getdbt.com/docs/collaborate/environments#how-do-i-maintain-different-environments-with-dbt)
Developer writes or modifies a model and runs the following command:
dbt run -s model_B --target dev --defer --state folder_path_to_production_manifest

Read this article carefully and what the difference gives flag defer and state (Defer | dbt Developer Hub)

Simply put, it will create model_B in the development project under dataset dev_alice using data from model_A in the production project.

Furthermore if developer writes another command:
dbt run -s model_C --target dev --defer --state folder_path_to_production_manifest
Given that model_B already exists in the dev_alice dataset, it will refer to this modified model in dev env.

This workflow allows us to go further through the pipeline and implement required code changes without modifying production data and testing if the changes work with other models.

When the development part is over, the developer pushes code and creates a pull request.

Summary

The main point is that in most cases, you should handle the environment using the target flag and define the target project and dataset in the project.yml file.

Regarding production data, you should use flags defer and state to dynamically change referred models to production models in case they do not exist in the development dataset.

NOTES:

you should constantly update compiled manifest.json file for the developers to point to the newest version.
each developer defines their development dataset in the project.yml file. It is possible to write a more advanced materialization macro if there is a need to change the dataset based on a Jira task, for example (Custom schemas | dbt Developer Hub)
One more useful article (https://getindata.com/blog/up-running-data-pipeline-BigQuery-dbt/)

ludovic.gayet · January 26, 2023, 10:42am

Hi @wefo ,

Thanks for the time you took to give a well structured answer!
I have been working with --defer and --state previously and the reason why i think this is not a solution in my case is because i don’t want this behaviour :
Given that model_B already exists in the dev_alice dataset, it will refer to this modified model in dev env.

Let’s say i want to build model_C in dev. I want that whatever if model_A and model_B exists in my current environment (dev), i will still be using production data. (because we don’t know if model_A have been build with production data or dev data or if the data are up to date in dev).
And if i want to build model_B then model_A will be pointing to prod env. So there have to be a macro deciding which env to take based on the fact that i’m part of the {{selected_resources}} no?

To renforce the fact that the --deferflag is not my solution. This mention in the --defer documentation is not ok with what i want to produce =>

When the --defer flag is provided, dbt will resolve ref calls differently depending on two criteria:

Is the referenced node included in the model selection criteria of the current run?

Does the reference node exist as a database object in the current environment?
If the answer to both is no—a node is not included and it does not exist as a database object in the > > current environment—references to it will use the other namespace instead, provided by the state manifest.

Do you see why i think the --defer flag is not a solution in my particular case?

wefo · January 26, 2023, 11:01am

Yes, I get your situation, but it’s very strange anyway.

Have you thought about misalignment? If I add a column to model_B and test it in dev, after that, I want to adjust model_C, including this column. I have to take into account that the column exists, and I cannot take production data. I should use model_B, which I already created. This workflow ensures that there will be no discrepancies between related models when I push all changes to production.

On the other hand, if you work on entirely different tasks and want to make independent adjustments to model_B and model_C. And as you mention, in this case model_B should take model_A from prod and model_C from model_B in prod.

I have solved this using a custom schema name, and my developers work not on a dedicated dataset (which we could set in the profiles.yml file), but they pass an additional argument (Jira task ID) to dbt run command. Then we have a defined custom schema generator (Custom schemas | dbt Developer Hub) which takes that argument, e.g.

dbt run --select model_B --target dev --vars 'task: JIRA_ID'
this compiles to
dev_project.JIRA_ID.model_B

Obviously, you should remember not to create model_A in your dev so defer could do the job, otherwise it will not work.

What I like about it is that we could check not only the code during pull request, but also check how the data looks in dev env. Furthermore, the developer could work on different tasks simultaneously without messing with different code changes in the same dataset. After the pull request passes all tests, we delete the associated dataset.

Topic		Replies	Views
dbt defer to production manifest Help deferral-and-state	8	3879	January 18, 2025
Change Database Name when using --defer and --state Help dbt-cloud , deferral-and-state	4	2075	April 26, 2023
DBT Cloud dev, stg, prod configurations Help best-practice , environments , bigquery , dbt-cloud , dbt_project-yml	0	199	October 30, 2024
ref() not working on Bigquery for a dataset different than the --target but source() is working Help jinja , bigquery , macros , dbt-core	0	959	January 14, 2024
Defer is sometimes using development schemas instead of only using production objects Help dbt-cloud , deferral-and-state	2	1947	February 22, 2024

Using macro {{this}} and {{selected_resources}} fail because one is empty at compilation and the other at execution

The problem I’m having

Summary

Related topics