Prevent user to run the whole project

brunoszdl · July 19, 2023, 9:32pm

Hi there!

I was searching for a way to prevent the user to run the whole project accidentally. For example, for very large projects, if someone accidentally does a dbt run dbt can try to run thousands of models and it can be very costly.

Of course, you can interrupt it with ctrl-c, but I wanted something safer (the user can just run the command and go drink a coffee), so I created this very simple macro, that can be used with the on-run-start hook.

macros/check_select_arg.sql

{% macro check_select_arg() %}

    {% if not invocation_args_dict.select and target.name != 'prod' %}

        {{ exceptions.raise_compiler_error("Error: You must provide at least one select argument") }}

    {% endif %}

{% endmacro %}

dbt_project.yml

on-run-start:
  - "{{ check_select_arg() }}"

It checks if the user has invoked some command with the select argument, and if they don’t, an error will be raised.

You can allow some specific targets to run commands without this restriction, to configure production jobs easier, for example.

You can customize this macro the way it makes more sense to you.

mickaelandrieu · August 3, 2023, 8:09pm

Hi Bruno,

for some unexpected reason, this macro is executed only once.

I explain:

dbt run => BOOM => exception

if i do dbt run again, this time I don’t enter at all inside the macro (I’ve tried to output some log instruction at the highest available level.

Is is intended ? How can we do to ensure this check is executed everytime we execute a run (dbt 1.4.5)

brunoszdl · August 3, 2023, 8:37pm

Hi @mickaelandrieu, not sure I got it.

I try running it twice here and it worked

Can you give me more details? Did you added the macro to on-run-start?

mickaelandrieu · August 3, 2023, 9:00pm

Yes,

this way:

# dbt_project.yml
on-run-start: "{{ check_select_args() }}"

Here is “my” french version:

{% macro check_select_args() %}
    {% do log('ON START ✅', True) %}
    {% if not invocation_args_dict.select and target.name != 'prod' %}
        {{ exceptions.raise_compiler_error("❌ Tu dois spécifier le ou les fichiers à reconstruire à l'aide de l'argument -s : dbt run -s tags:jobready ") }}
    {% endif %}
{% endmacro %}

I don’t know why it’s not triggered at all, even the log message:

❯ dbt run
21:00:50  Running with dbt=1.4.5
21:00:50  [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 2 unused configuration paths:
- models.yyy
- models.xxx
21:00:50  Found 161 models, 132 tests, 3 snapshots, 0 analyses, 560 macros, 0 operations, 
18 seed files, 73 sources, 9 exposures, 0 metrics
21:00:51  
21:00:52  Concurrency: 4 threads (target='dev')
21:00:52  
21:00:52  1 of 161 START sql table model [Cant add more info, enterprise project]

And I’ve renamed the macro to add an “s” both on the file name (useless) and on the macro name (required).
I’ll sleep thinking on that and maybe tomorrow I’ll have the answer

brunoszdl · August 3, 2023, 9:17pm

That’s weird, I’ve tested your version and it worked.

Is your macro is inside the macros/ folder?

mickaelandrieu · August 3, 2023, 9:23pm

Yes !

Funny thing: just upgraded to the latest 1.5 (1.5.4) and … it works. Meh… I don’t like when I don’t understand.

Though, the latest 1.6 produce an error, so I can’t upgrade atm => “cannot import name ‘POLLING_PREDICATE’ from ‘google.api_core.future.polling’”

But this is another issue for another post ! Thanks for your help Bruno, and honored to meet you (even if it’s virtual) : I’m an happy reader of your tips on LinkedIn

adiamaan92 · August 4, 2023, 5:34pm

This was very helpful. We also noticed that dbt compile calls dbt run. Usually compile is a cheap operation that can be used to test massive refactors. This would prevent project wide compile.

If you want this check to bypass compile and only apply for dbt run and dbt build you can do this,

{% macro check_select_args() %}
    {% if not invocation_args_dict.select and (target.name != 'prod') and (invocation_args_dict.which in ['run', 'build'])%}
        {{ exceptions.raise_compiler_error("run/build should have select argument. ") }}
    {% endif %}
{% endmacro %}

brunoszdl · August 4, 2023, 6:44pm

Awesome addition to the macro @adiamaan92 !! Loved that

This way the admin can choose which commands will follow this rule. I would add test, source and snapshot, and we can also make a list for the target names

{% macro check_select_arg() %}

    {% if
        not invocation_args_dict.select
        and target.name not in ['prod']
        and invocation_args_dict.which in ['build', 'run', 'test', 'source', 'snapshot']
    %}

        {{ exceptions.raise_compiler_error("Error: You must provide at least one select argument") }}

    {% endif %}

{% endmacro %}

kt12 · March 18, 2024, 2:32pm

@adiamaan92 and @brunoszdl thanks for creating this macro.

I’m using a variation of your macro with the Bigquery, and interrupting does not stop the adapter from running the pipeline. Is there another exception or alternate action I can take to cancel all jobs?

There are related discussions to get the BQ adapter and dbt-core to cancel all queries here. Thank you!

UPDATE/EDIT - getting strange behavior from this macro - the first time I ran it, it raised the exception but it seems the jobs were already sent to BQ and they completed. The second and subsequent times a compilation error was raised and nothing was executed. The only difference was I dropped the max threads count in my ~/.dbt/profiles.yml to be lower than the number of models that would have been run on the second and subsequent runs.

Second Update - when trying to generate a model from a source file in VSCode using dbt power user automatically, the macro actually prevents the model from being generated.

fillipo-balseretti · May 29, 2024, 10:00am

It’s interesting because on my side, i’m running dbt core (version 1.7.13) and that macro only works when the target folder is empty. As soon as I run another dbt command to run a specific model, and try to execute a dbt run again, it doesn’t give the compilation error anymore.

rulyanf · July 17, 2024, 2:19am

Its an issue with on-hook-start that doesn’t block the run when a compilation failured is triggered. You can see in log of the run that a Database error existed. You can look the issue here: [CT-2427] [Feature] [Investigate] Gracefully handle failures in on-run-* hooks, and skip subsequent hooks/nodes · Issue #7387 · dbt-labs/dbt-core · GitHub

I found a workaround if you edit the default behaviour of the source macro and execute the check there. Create a sources.sql macro:

{% macro source(source_name, table_name) -%}

    {{ check_select_arg() }}
    {{ return(builtins.source(source_name, table_name)) }}

{%- endmacro %}

This will prevent most things to build with the exception of seeds and models that reference those seeds, but you can do the same thing by creating and editing the ref() macro in your project and then completely block a dbt run or dbt build without selectors.

fillipo-balseretti · July 18, 2024, 8:25am

This is awesome! I’ve tested it and also added a ref() macro to override the builtin macro and is working.

prathmeshphalke_bsc · December 13, 2024, 7:20pm

Hi, I am trying to this samething in dbt cloud but my macro is getting undefined error.

Command failed
Compilation Error in operation udp_gscda_distribution-on-run-start-0 (./dbt_project.yml)
‘check_args’ is undefined. This can happen when calling a macro that does not exist. Check for typos and/or install package dependencies with “dbt deps”.
Any thoughts?

Topic		Replies	Views
run macro on_run_start in dbt_project.yml problem help Help macros	3	170	December 19, 2024
If clause in project.yml Help snowflake , dbt-cloud	4	3766	June 20, 2023
dbt found two macros named issue Help	4	2514	April 17, 2024
dbt cloud - commands in script Help jinja , dbt-cloud	1	1550	January 11, 2023
execute dbt models from macro Help macros , dbt-core	1	1248	March 15, 2024

Prevent user to run the whole project

Related topics