We are in dbt Cloud. Many times (especially for incremental models) I would like to run a model only one time with specific set of parameters, which I don’t want to push into the main branch. I would like to test a dbt run in development environment and then I would like to run the same code in production. Same, meaning that all the ref() functions and so on use the production schema instead.
Can you explain why? dbt’s core premise is one of idempotency - when you run the same transformation you will get the same results. Why do you want to have some changes applied to your table only once and not have them remain if you rebuilt the model in the future?
As I mentoined, it’s an incremental model, where I need to calculate the whole history. I’m doing a daily customer snapshot. I loop day by day and UNION with previous result. It’s a lot of data. I don’t want to run it every day.
I know that I can create a project variable, based on which I will define a time span, for which the data will be calculated. I’m doing this in another model. But also here I have to merge branch to repository then go to Deploy → Jobs, then update the setting of a manual job. I find this a bit redundant.
Isn’t there a possibility to dbt run with a flag, which would tell which schema to use?
Your instincts are correct - this doesn’t sound right. Can you share the code of the model you’re trying to build?
If your incremental model is set up in line with best practices, you shouldn’t need to be building it in development every day and then redeploying.
It should be something like
with upstream_model as (
select * from {{ ref('upstream_model') }}
),
final as (
select
all,
the
transformations,
modified_date
from upstream_model
{% if is_incremental() %}
where upstream_model.modified_date > (select max(this.modified_date) from {{ this }} as this)
{% endif %}
So that you’re only processing net-new data and you’re only processing it in the production environment.
The reason I’m pushing back here is that it sounds like you are a manual part of your orchestration process, by having to manually update the query and run it every time you want new data loaded; you shouldn’t need to be doing that.
I’m doing the manual update only once upon a time. For example when some logic changes or mostly due to adding a new column.
In most cases we are trying to follow the best practices.
Short description for what this model should be doing:
creates a weekly customer table, containing some specific business important metrics
each day the current and last week (defined in mondays_query) is recalculated and a snapshot for that given week is created
for given week we always look at user activities on the web in the past 90 days (roughly) (see WHERE clause)
I iterate over weeks (2 in standard run) and select from quite a large table. The result inserts or overwrites data in the destination table
but once upon a time I need to recalculate the whole thing. The compiled code is then quite large and I guess it uses a lot of resources for computation…
Here is the code (shortened for readability):
-- depends_on: {{ ref('ga_events') }}
{{
config(
materialized = 'incremental',
unique_key = ['week_monday', 'user_id']
)
}}
{%- set mondays_query -%}
SELECT
DISTINCT CAST(DATE_TRUNC(generated_day, ISOWEEK) AS STRING) AS week_monday
FROM UNNEST(GENERATE_DATE_ARRAY({{ get_week_start_date(-7) }}, {{ get_week_start_date() }})) as generated_day
{%- endset -%}
{%- set rows = run_query(mondays_query) -%}
{%- if execute -%}
{%- set mondays_list = rows.columns[0].values() -%}
{%- else -%}
{%- set mondays_list = [] -%}
{%- endif -%}
{% for monday in mondays_list %}
SELECT
CAST(EXTRACT(YEAR FROM DATE_TRUNC('{{ monday }}', ISOWEEK)) AS STRING) || '_' || LPAD(CAST(EXTRACT(WEEK FROM DATE_TRUNC('{{ monday }}', ISOWEEK)) AS STRING), 2, '0') AS week,
some_other_columns
FROM {{ref('ga_events')}}
WHERE event_date BETWEEN DATE_ADD('{{ monday }}', INTERVAL -91 DAY) AND DATE_ADD('{{ monday }}', INTERVAL 6 DAY)
GROUP BY user_id
{% if not loop.last %} UNION ALL {% endif %}
{%- endfor %}
We don’t use the snapshot models because they are difficult to join. That’s why we do snapshots in this fashion.