Is there a way to have two different models coexist that is writing to same table?

Is there a way to have two different models coexist that is writing to same table?

The reason for this is because I am trying to find a way to run a copy of my main model using a different dbt job.

Here are the solutions that I’ve tried so far:

  1. Put the model copy into a different folder
    • throws an error that models have the same name as the original model
  2. Renamed the model copy
    • throws dbt cannot create two resources with identical database representations
  3. Tried using --exclude to exclude the folder containing the original model
    • same error as #2

Any thoughts on this is very much appreciated. Thank you.

How are these variants of the model different from one another? Why do they need to be two different queries going into the same table?

The standard way to do this in dbt would be to have each model do a single thing, and then have another model which is the union of those two tables. Even that might not be necessary though depending on what your goal is - maybe you could do it all in one model, and have an additional column on the end describing its origin

Unless it’s built as an incremental model, this would overwrite the table’s contents every time.

1 Like

There’s no difference in terms of what each of the models does but they differ in terms of when they are executed. The original model/job runs on schedule while the copy runs when needed. I’m trying to have a generic job that can execute the models in a specific folder. But that’s the error that I’m getting.

If it’s the same code, you should use the dbt model selection syntax to pick the specific models, not make a copy of them. For example you could add a tag and then run dbt run --select tag:my_tag as an adhoc job that is different to your regularly scheduled dbt run command.

Put another way, your proposed approach conflates the orchestration logic and the business logic of your files. You should build the models that you need to see in your database, and then attach extra metadata to them to make them runnable on the schedule you need, not have your project structure driven by your orchestration logic.

1 Like

I am assuming, the models differ in terms of where clause then. I had similar requirements. The way I modelled it is by creating a variable e.g. is_daily_run in the project.yml file and the default value is set to true which will be the case for regular run. When the variable is false, it will run as copy mode.

Now in the model, you will play with the where condition checking the value of the is_daily_run variable.

{% if var("is_daily_run") == 'True' %}

          AND load_date>= last_run_date

{% else %}
        AND 1=1 -- basically run for full data or insert your rules for copy
        
{% endif %}

You can also play with the variable to switch the source if the source is different for daily vs full load jobs etc.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.