Limiting dev runs with a dynamic date range

Hey folks,

So far I’ve been limiting the date range on base model view files when doing dev runs as follows:

{% if target.name == 'dev' %}
where date(date) >= date_add(current_date(), interval -3 day)
{% endif %}

This works fine, but with many models it’s not so DRY what with all this repeated code across the base model files. Thus I’d like to make some form of ‘dev_limit_date’ macro.

To note:

  • I would like to set the date interval (i.e. ‘- 3 days’) in the macro, and not in the separate model files, because I may change this ad-hoc depending on what is being tested in dev. I only want to state it once.

  • ‘date(date)’ is subject to change across model files, depending on the underlying source date field to be filtered. So I may need to set the date field to be filtered in the model file, i.e. with a {% set filter_date = 'date' %}

I’ve previously tested this in a number of ways (e.g. here: https://getdbt.slack.com/archives/C0VLZPLAE/p1562678315423500?thread_ts=1562671748.420900&cid=C0VLZPLAE) but I cannot seem to get it running nicely. Anyone else tackled this problem before and have it up & running in their production environment?

Much appreciated,

Lewis

Hey there - check out this issue - sounds like there’s some overlap with what you’re looking to accomplish here. https://github.com/fishtown-analytics/dbt/issues/1592

Would love to further the discussion either in this thread, or on that issue!

Hey @drew,

Thanks for your suggestion - thats for sure an interesting read and I’ve taken some points made in it into consideration. I’ll continue the ‘dev_limit’ macro discussion here however.

What I’m now thinking is more of a macro that adds specified LIMIT where called in models, i.e. base for a limit on source tables. I’ve managed to successfully compose a macro that takes an argument that sets a specified LIMIT in terms of result row count (or takes a default limit) if target is dev - (I am checking in the target run file after running). But what I am struggling to do is add a disabling feature on this macro (that works) - i.e. via a second macro that can disable the first.

I’m setting the row count limit via the cli when running models in dev, i.e. as follows:

dbt run --vars 'row_count: 100' (sets a dev limit row count of 100, overwriting the default)
dbt run(runs with the default dev limit row count (i.e. here 10,000))
dbt run --vars 'dev_disable: true'(this would ideally disable the dev_limit_rows macro)

So, here are my files:

(model file) model.sql :

select *
from {{ source('data', 'source') }}
{{ dev_limit(row_count, dev_disable) }}

(macro) dev_limit_rows.sql :

{% macro dev_limit_rows(row_count, dev_disable) -%}

{{ config(tags=["dev_limit"]) }}

{# set row count limit on base tables for dev runs - '10000' is default value if not explicitly defined #}

{% if target.name == 'dev' %}

  limit {{ var("row_count", "10000") }}

{% endif %}

{%- endmacro %}

(macro) dev_limit.sql :

{% macro dev_limit(row_count, dev_disable) -%}


  {%- if dev_disable -%}

    --no op

  {%- else -%}

    {{ dev_limit_rows(row_count, dev_disable) }}

  {%- endif -%}

{%- endmacro %}

I feel like I am close to cracking it, but have tried enough alterations now :sweat_smile: Something in the dev_limit.sql must be off. Any help/advice/direction is GREATLY appreciated!

Hey @lewis! This is cool :slight_smile:

The thing that jumps out at me here is your model SQL:

select *
from {{ source('data', 'source') }}
{{ dev_limit(row_count, dev_disable) }}

Should this instead be

select *
from {{ source('data', 'source') }}
{{ dev_limit(var('row_count'), var('dev_disable')) }}

Maybe one approach is to push the limit/disable logic into the dev_limit macro, so:

dev_limit.sql

{% macro dev_limit() -%}
  {%- if var('dev_disable', default=false) -%}
    --no op
  {%- else -%}
    {{ dev_limit_rows(var('row_count')) }}
  {%- endif -%}
{%- endmacro %}

dev_limit_rows.sql

{% macro dev_limit_rows() -%}
  {{ config(tags=["dev_limit"]) }}
  {# set row count limit on base tables for dev runs - '10000' is default value if not explicitly defined #}

  {% if target.name == 'dev' %}
    limit {{ var("row_count", "10000") }}
  {% endif %}
{%- endmacro %}

model.sql

select *
from {{ source('data', 'source') }}
{{ dev_limit() }}

Would something like that work?

Hey @drew,

Thanks! Given your suggestions and a few tweaks I have got it running for the following use cases:

Running in dev with a default row limit (set in macro to 10,000):
dbt run
Running in dev with a specified limit via cli:
dbt run --vars 'row_count: 100'
Running in dev disabling the ‘dev_limit’ via cli:
dbt run --vars 'dev_disable: true'

I’m sharing macro files and an abbreviated model file below, hopefully this is helpful for others. Also maybe there is still yet a neater of doing this; someone might want to pitch-in, continuing this thread.

model.sql

select *
from {{ source('data', 'source') }}
{{ dev_limit() }}

dev_limit.sql

{% macro dev_limit(row_count, dev_disable) -%}


  {%- if var('dev_disable', default=false) -%}

    --no op

  {%- else -%}

    {{ dev_limit_rows(row_count) }}

  {%- endif -%}

{%- endmacro %}

dev_limit_rows.sql

{% macro dev_limit_rows(row_count) -%}

{{ config(tags=["dev_limit"]) }}

{# set row count limit on base tables for dev runs - '10000' is default value if not explicitly defined #}

{% if target.name == 'dev' %}

  limit {{ var("row_count", "10000") }}

{% endif %}

{%- endmacro %}

That’s really great! Thanks for sharing the finished product :slight_smile: