Prevent user to run the whole project

Hi there!

I was searching for a way to prevent the user to run the whole project accidentally. For example, for very large projects, if someone accidentally does a dbt run dbt can try to run thousands of models and it can be very costly.

Of course, you can interrupt it with ctrl-c, but I wanted something safer (the user can just run the command and go drink a coffee), so I created this very simple macro, that can be used with the on-run-start hook.

macros/check_select_arg.sql

{% macro check_select_arg() %}

    {% if not invocation_args_dict.select and target.name != 'prod' %}

        {{ exceptions.raise_compiler_error("Error: You must provide at least one select argument") }}

    {% endif %}

{% endmacro %}

dbt_project.yml

on-run-start:
  - "{{ check_select_arg() }}"

It checks if the user has invoked some command with the select argument, and if they donā€™t, an error will be raised.

You can allow some specific targets to run commands without this restriction, to configure production jobs easier, for example.

You can customize this macro the way it makes more sense to you.

4 Likes

Hi Bruno,

for some unexpected reason, this macro is executed only once.

I explain:

dbt run => BOOM => exception

if i do dbt run again, this time I donā€™t enter at all inside the macro (Iā€™ve tried to output some log instruction at the highest available level.

Is is intended ? How can we do to ensure this check is executed everytime we execute a run (dbt 1.4.5)

Hi @mickaelandrieu, not sure I got it.

I try running it twice here and it worked

Can you give me more details? Did you added the macro to on-run-start?

Yes,

this way:

# dbt_project.yml
on-run-start: "{{ check_select_args() }}"

Here is ā€œmyā€ french version:

{% macro check_select_args() %}
    {% do log('ON START āœ…', True) %}
    {% if not invocation_args_dict.select and target.name != 'prod' %}
        {{ exceptions.raise_compiler_error("āŒ Tu dois spĆ©cifier le ou les fichiers Ć  reconstruire Ć  l'aide de l'argument -s : dbt run -s tags:jobready ") }}
    {% endif %}
{% endmacro %}

I donā€™t know why itā€™s not triggered at all, even the log message:

āÆ dbt run
21:00:50  Running with dbt=1.4.5
21:00:50  [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 2 unused configuration paths:
- models.yyy
- models.xxx
21:00:50  Found 161 models, 132 tests, 3 snapshots, 0 analyses, 560 macros, 0 operations, 
18 seed files, 73 sources, 9 exposures, 0 metrics
21:00:51  
21:00:52  Concurrency: 4 threads (target='dev')
21:00:52  
21:00:52  1 of 161 START sql table model [Cant add more info, enterprise project]

And Iā€™ve renamed the macro to add an ā€œsā€ both on the file name (useless) and on the macro name (required).
Iā€™ll sleep thinking on that and maybe tomorrow Iā€™ll have the answer :slight_smile:

1 Like

Thatā€™s weird, Iā€™ve tested your version and it worked.

Is your macro is inside the macros/ folder?

Yes !

Funny thing: just upgraded to the latest 1.5 (1.5.4) and ā€¦ it works. Mehā€¦ :confused: I donā€™t like when I donā€™t understand.

Though, the latest 1.6 produce an error, so I canā€™t upgrade atm => ā€œcannot import name ā€˜POLLING_PREDICATEā€™ from ā€˜google.api_core.future.pollingā€™ā€

But this is another issue for another post ! Thanks for your help Bruno, and honored to meet you (even if itā€™s virtual) : Iā€™m an happy reader of your tips on LinkedIn :slightly_smiling_face:

1 Like

This was very helpful. We also noticed that dbt compile calls dbt run. Usually compile is a cheap operation that can be used to test massive refactors. This would prevent project wide compile.

If you want this check to bypass compile and only apply for dbt run and dbt build you can do this,

{% macro check_select_args() %}
    {% if not invocation_args_dict.select and (target.name != 'prod') and (invocation_args_dict.which in ['run', 'build'])%}
        {{ exceptions.raise_compiler_error("run/build should have select argument. ") }}
    {% endif %}
{% endmacro %}
2 Likes

Awesome addition to the macro @adiamaan92 !! Loved that

This way the admin can choose which commands will follow this rule. I would add test, source and snapshot, and we can also make a list for the target names

{% macro check_select_arg() %}

    {% if
        not invocation_args_dict.select
        and target.name not in ['prod']
        and invocation_args_dict.which in ['build', 'run', 'test', 'source', 'snapshot']
    %}

        {{ exceptions.raise_compiler_error("Error: You must provide at least one select argument") }}

    {% endif %}

{% endmacro %}
3 Likes

@adiamaan92 and @brunoszdl thanks for creating this macro.

Iā€™m using a variation of your macro with the Bigquery, and interrupting does not stop the adapter from running the pipeline. Is there another exception or alternate action I can take to cancel all jobs?

There are related discussions to get the BQ adapter and dbt-core to cancel all queries here. Thank you!

UPDATE/EDIT - getting strange behavior from this macro - the first time I ran it, it raised the exception but it seems the jobs were already sent to BQ and they completed. The second and subsequent times a compilation error was raised and nothing was executed. The only difference was I dropped the max threads count in my ~/.dbt/profiles.yml to be lower than the number of models that would have been run on the second and subsequent runs.

Second Update - when trying to generate a model from a source file in VSCode using dbt power user automatically, the macro actually prevents the model from being generated.

Itā€™s interesting because on my side, iā€™m running dbt core (version 1.7.13) and that macro only works when the target folder is empty. As soon as I run another dbt command to run a specific model, and try to execute a dbt run again, it doesnā€™t give the compilation error anymore.

Its an issue with on-hook-start that doesnā€™t block the run when a compilation failured is triggered. You can see in log of the run that a Database error existed. You can look the issue here: [CT-2427] [Feature] [Investigate] Gracefully handle failures in on-run-* hooks, and skip subsequent hooks/nodes Ā· Issue #7387 Ā· dbt-labs/dbt-core Ā· GitHub

I found a workaround if you edit the default behaviour of the source macro and execute the check there. Create a sources.sql macro:

{% macro source(source_name, table_name) -%}

    {{ check_select_arg() }}
    {{ return(builtins.source(source_name, table_name)) }}

{%- endmacro %}

This will prevent most things to build with the exception of seeds and models that reference those seeds, but you can do the same thing by creating and editing the ref() macro in your project and then completely block a dbt run or dbt build without selectors.

This is awesome! Iā€™ve tested it and also added a ref() macro to override the builtin macro and is working.