How can I avoid repeatedly querying a status table via a macro and store its results for reuse?

Hello everyone,

I’m working on a monthly reporting process for multiple clients, each of which have their own models (3-4 staging tables (most of the main models pull from multiple staging tables), 5 main tables and 5 kqi tables per client . These models all depend on a single macro that queries a small Snowflake table for readiness status and other critical details (such as group_key and reporting month). This macro is called 10–20 times per client, as it’s used for:

  • Updating logs via pre/post hook

  • Inserting group_key and date info into each table/model

Although the table isn’t large (it just contains client IDs, readiness status, group_key, etc.), I suspect repeatedly calling the macro might be causing concurrency issues in Dagster whenever multiple jobs run simultaneously. The table results don’t change until the end of the entire process, so it feels wasteful to keep hitting the database over and over.

My questions:

  1. Is there a recommended way in dbt to call a macro/query once and store its results in memory (or otherwise persist them) so every model doesn’t re-run the same query?

  2. Have others come across concurrency or performance issues when multiple dbt jobs are running in parallel against small tables, and how did you address it?

I’ve considered refactoring the models to reduce the number of calls, but I’m hoping there’s a simpler approach that still lets me keep the check readiness logic in a single place. Any insights on how to optimize or store the macro’s result would be greatly appreciated. Thanks in advance!

You mentioned you only want the macro to run once. Do you want to run once per dbt run or once per day? What is the frequency that you’re seeking for that one table to be refreshed?

Just once per a run, only 1 client is run per a request which gets kicked off by a dagster sensor. The status table gets updated twice per a run, in the beginning and the end of the process. I’m coming from the Microsoft world using SSIS/MSSQL so I’m still new to this, but now that I’m thinking about it I may need to look more at dagster for storing data in memory. I’m just trying to figure out how to stop querying my rev_date_log table so many times (which is my status table) and see if I could store the results in memory for each model to read from.

I am not familiar with Dagster and I am not following your problem. However, it seems like this could be more optimal. A bit challenging to provide advice without seeing the code though.

If you have multiple clients, I would think you could have a clients table and join against that to produce the results you need.

Macros can be stored in a table and then used by downstream models which only use the table.

Additionally, you mentioned concurrency —like you’re kicking off the job multiple times —but also said you want it to run once per job. Maybe you could have it run separately from each of the jobs and then allow each of the concurrent jobs to fire off after it’s done.