What do i want to achive?
Query existing tables to find dates (=partition) filters to be used as date (=partition) filters in dbt models. For filtering i could do joins in each and every dbt model but it turns out to execute slowly. Rather i want to find certain dates first and then substitute those dates as variables in dbt models.
How would i like to achive this?
I would like to
- run a query before any model exectuion. The query yields a table with a single row like:
date_1: 2020-01-01, date_2: 2020-03-1, date_3: 2020-09-01, date_4= 2020-12-01
. Store this output into variables (best dictionary) and then - select those values in a dbt model (e.g. in
where date_column = '{{ fetched_dates['date_1'] }}'
such that after compilation we get hardcoded values in the sqlwhere date_column = '2020-01-01'
What i tried:
I used on_run_start
in dbt_project.yml
to execute a macro. The macro uses run_query
to execute a sql and store its single row output with multiple columns in a dictionary via set fetched_dates = sql_result.rows[0]
. This seems to work. But when i reference this variable later in a dbt model as in where date_column = '{{ fetched_dates['date_1'] }}'
it complains with fetched_dates is undefined. This can happen when calling a macro that does not exist. Check for typos and /or install package dependencies with "dbt deps"
.
Any advice on how to proceed? Do i have to execute my date fetching query outside from dbt and then pass values to dbt as project variables? Would have much rather relied on logic within dbt itself.