Complex loop calc in dbt+postgres

Hello everyone,

I need to recursively/cyclically calculate data by months with intermediate saves to the database, as the next period will be calculated/saved based on these data.

Details:

  1. We have data based on which a month’s calculations are made.
  2. The next month is calculated based on these results, and so on until all missing months are calculated and saved.
  3. Each monthly calculation involves 4-5 complex views.

In another stack (not dbt + airflow), this is solved as follows:

  • Calculate the month based on initial data.
  • Save the results to a technical table.
  • Calculate the next month based on the technical table.
  • Repeat until completion.

In dbt, this can theoretically be implemented in several ways:

  • Write a custom materialization scheme.
  • Adapt the experimental insert-by-period.
  • Use templates to generate the entire calculation through Jinja in one pass (though this could be cumbersome and resource-intensive).

Another considered option is to create simple models for the initial and n-th monthly calculations and organize multiple runs via airflow:

However, the customer has limited access to airflow, making DAG debugging inconvenient.

When attempting to use insert-by-period, the following error occurred:

column "__period_filter__" does not exist
LINE 15: where __PERIOD_FILTER__

Does anyone have any ideas on how this can be correctly implemented in dbt? Is there a proper way to use insert-by-period or another approach?

Thank you in advance!

Are thee months static once calculated or does whole process have to run daily

Note: @NickP originally posted this reply in Slack. It might not have transferred perfectly.

After “closing the month,” the data for that month does not change.

Data can be updated (added to) in the current month.

However, new business objects may be added for past months, and a full calculation needs to be performed for those months where data is “missing” for these entities.

The calculation logic may also change, in which case the entire history or part of it needs to be recalculated.

Does sound overly complex
In the job scheduler you can use select with the model and call than n times to get your recursive
From what you describe you have a monthly process and something makes that final.
Could you recalculate as a full refresh for the current month and populate a history inctemental model once it’s final and an anti join so it only does once.
Sounds this is involving moving existing old school database to cloud , rebuilding in full may be cheaper the the datacenginerr time and more accurate.

You could also try inserting the current month every day and have an analytic to select most recent data although dosnt sound like that would fit your sceraruo

Note: @NickP originally posted this reply in Slack. It might not have transferred perfectly.