Complex loop calc in dbt+postgres

Michael3 · July 2, 2024, 10:29pm

Hello everyone,

I need to recursively/cyclically calculate data by months with intermediate saves to the database, as the next period will be calculated/saved based on these data.

Details:

We have data based on which a month’s calculations are made.
The next month is calculated based on these results, and so on until all missing months are calculated and saved.
Each monthly calculation involves 4-5 complex views.

In another stack (not dbt + airflow), this is solved as follows:

Calculate the month based on initial data.
Save the results to a technical table.
Calculate the next month based on the technical table.
Repeat until completion.

In dbt, this can theoretically be implemented in several ways:

Write a custom materialization scheme.
Adapt the experimental insert-by-period.
Use templates to generate the entire calculation through Jinja in one pass (though this could be cumbersome and resource-intensive).

Another considered option is to create simple models for the initial and n-th monthly calculations and organize multiple runs via airflow:

However, the customer has limited access to airflow, making DAG debugging inconvenient.

When attempting to use insert-by-period, the following error occurred:

column "__period_filter__" does not exist
LINE 15: where __PERIOD_FILTER__

Does anyone have any ideas on how this can be correctly implemented in dbt? Is there a proper way to use insert-by-period or another approach?

Thank you in advance!

a_slack_user · July 3, 2024, 4:50am

Are thee months static once calculated or does whole process have to run daily

_{Note: @NickP originally posted this reply in Slack. It might not have transferred perfectly.}

Michael3 · July 3, 2024, 8:53pm

After “closing the month,” the data for that month does not change.

Data can be updated (added to) in the current month.

However, new business objects may be added for past months, and a full calculation needs to be performed for those months where data is “missing” for these entities.

The calculation logic may also change, in which case the entire history or part of it needs to be recalculated.

a_slack_user · July 5, 2024, 9:50am

Does sound overly complex
In the job scheduler you can use select with the model and call than n times to get your recursive
From what you describe you have a monthly process and something makes that final.
Could you recalculate as a full refresh for the current month and populate a history inctemental model once it’s final and an anti join so it only does once.
Sounds this is involving moving existing old school database to cloud , rebuilding in full may be cheaper the the datacenginerr time and more accurate.

You could also try inserting the current month every day and have an analytic to select most recent data although dosnt sound like that would fit your sceraruo

_{Note: @NickP originally posted this reply in Slack. It might not have transferred perfectly.}

Topic		Replies	Views
Insert to a table using loops Help jinja , dbt-utils	1	1897	February 13, 2023
Modeling inserts-in-a-loop in dbt Archive	1	4558	October 14, 2021
insert by period materialization failing on last hour Help incremental	3	1513	October 17, 2022
A materialization for problematic initial builds Archive	4	10395	January 15, 2021
Dbt Orchestration: Airflow vs Dbt Jobs Help best-practice , bigquery , airflow , dbt-cloud	6	4875	January 17, 2023

Complex loop calc in dbt+postgres

Related topics