I have created a model that I only want to run one time per month (lets say Model Month), but the other models need to run twice a day (lets say Model Day). I want to exclude the first model from the dbt run without using every time the --exclude function.
This is because we don’t want other colleagues to add the --exclude command to every dbt run. So we prefer to have a solution where we can run the ‘dbt run’ but without Model Month. Ones a month we would like to run the Model Month without changing any settings.
I tried to set the ‘enabled’ to false in the config and that works fine. But when I want to run Model Month, it says ‘The selection criterion ‘Model Month’ does not match any nodes’. This is because it is set to enabled false.
Is there an option to have by default Model Month disabled, but when I want it to run, I don’t need to change the settings from enabled false to true?
Can you explain more about why you want to skip this most of the time? Your dbt project shouldn’t depend on nodes running an exact number of times – or on specific days – for it to be correct; it should be built in an idempotent manner.
Also keep in mind that when a node is disabled, it also can’t be ref’d by downstream models, so it would always have to be the final model in your DAG.
Your colleagues should be working in their own development environments, so this should only impact your production jobs which you should be able to set up once and then leave alone for the most part.
With all that said, you could use a YAML selector and set the default property. This means that when you do a standard dbt run or dbt build without any other selection commands, it will exclude the monthly model. You would then have a separate job configured which is dbt run --select monthly_only_model.
We would like to skip this most of the time, because the data only needs to be refreshed ones per month. We don’t get new data in the intermediate period. In the model is a lot of complex sql code, so for the performance we would like to skip this file in the normal dbt run.
I am not sure if the selectors can fix the problem, because you still have to call the command dbt run --selector everything_except_the_monthly. I want to call the command dbt run and by default it excludes the monthly_model instead of excluding the the model in the command. Is that possible?
Have you considered an incremental model? This is a perfect use case - you can put a check in for new data, something like
with upstream as (
select * from {{ ref('upstream_complex_table') }}
)
select
-- complex transformations here...
from upstream
{% if is_incremental() %}
where upstream.loaded_at > (select max(this.loaded_at) from {{ this }} as this)
{% endif %}
It is possible, but you need to set the default property described above. When you do you’ll see this output:
$ dbt run
Running with dbt=1.4.1
[...]
Using default selector everything_except_the_monthly
Would tags work?
Tag everything with ‘monthly’, then tag everything excepting the ‘model month’ model with ‘daily’. then in the daily run, call tag:daily, and on the monthly run call tag:monthly