Hello everyone,
I’m working on a monthly reporting process for multiple clients, each of which have their own models (3-4 staging tables (most of the main models pull from multiple staging tables), 5 main tables and 5 kqi tables per client . These models all depend on a single macro that queries a small Snowflake table for readiness status and other critical details (such as group_key and reporting month). This macro is called 10–20 times per client, as it’s used for:
-
Updating logs via pre/post hook
-
Inserting group_key and date info into each table/model
Although the table isn’t large (it just contains client IDs, readiness status, group_key, etc.), I suspect repeatedly calling the macro might be causing concurrency issues in Dagster whenever multiple jobs run simultaneously. The table results don’t change until the end of the entire process, so it feels wasteful to keep hitting the database over and over.
My questions:
-
Is there a recommended way in dbt to call a macro/query once and store its results in memory (or otherwise persist them) so every model doesn’t re-run the same query?
-
Have others come across concurrency or performance issues when multiple dbt jobs are running in parallel against small tables, and how did you address it?
I’ve considered refactoring the models to reduce the number of calls, but I’m hoping there’s a simpler approach that still lets me keep the check readiness logic in a single place. Any insights on how to optimize or store the macro’s result would be greatly appreciated. Thanks in advance!