Can I dynamically add things to project-level state/config in on-operation-start?

denised · January 3, 2024, 2:53am

I would like to be able to do something like

{% do config.set('my-custom-param', 'my-custom-value') %}

and later do

{% set mv = config.get('my-custom-param')

somewhere else. I’ve already discovered that you can do this within a particular model, even though it is not a documented feature , but that only alters what I assume is a local copy of the config.

Why would I want to do this? Because I want to get a “batch-id” from the database in on-operation-start, then use that batch-id to tell the database that the batch has been finished in on-operation-end. I cannot think of how to get that context from one end of the dbt process to the other.

An alternative is to put this information in log messages, then make sure our logs are being saved appropriately (i.e. a solution outside of either dbt or the database). But I like the idea using the database, as it is one less moving part, and will automatically be independent for each dev environment.

jeffsk · January 7, 2024, 8:03pm

I don’t know about the answer to your question directly, but an easier way to get the latest batch ID is to query the database directly.

Check out the run query macro here:

Let me know if that helps you. If not, I’ll try again.

denised · January 8, 2024, 4:55pm

Thanks Jeff, I think you are correct that running it as a separate query is the right way to go. I’ve ended up with a solution that looks like this:

set batch_id=`dbt run-operation start_batch |awk '/batch-id: / {print $NF}'`
dbt run --args  "batch_id: $batch_id"

It would be kind of cool if I could do this without the shell script, but this works and I’m happy with it.

jeffsk · January 8, 2024, 6:53pm

Oh, what you came up with was way different than what I was thinking. I was thinking of an approach that only uses dbt, not an extra shell script. And therefore works in dbt Cloud (or other Cloud based way of running dbt).

From within the model itself you can do the run_query macro, save the value of your next sequence number to a variable, then do stuff with with the variable in the model.

denised · January 9, 2024, 6:39am

I get what you are saying about using the value within a model, but can I do that across multiple models? I mean, of course I could query the database in every model, but in my case that actually wouldn’t help me, since the batch id is coming from a Snowflake sequence, which gives you a different value each time you call it. I really do need to be able to hold on to the specific value.

However, if you mean saving to a var, so that a subsequent call to var(‘batch_id’) would get the value across the entire run, that would be interesting.

jeffsk · January 9, 2024, 1:53pm

Okay, we are slowly learning more about what you are trying to do. Now I understand it needs to be used across multiple models and that you’re using a snowflake sequence.

Can you post the entire concept of what you’re trying to do? Then I can help propose a solution.

denised · January 9, 2024, 8:54pm

The concept is pretty simple: there is a batch table in the database that keeps track of what batches. One stored procedure opens a batch and returns the new batch id, while another procedure closes the batch when the run is complete.

I am happy with the shell script solution, but as a newcomer to dbt, I’m interested in learning if there are better ways to set it up of course.

thanks

jeffsk · January 11, 2024, 6:25pm

Hi Denise,

The dbt way of thinking would be to eliminate all stored procedures when possible. Sometimes we need custom materializations to replace the sprocs, but often times we don’t.

From what I know about this, the basic “dbt only” workflow could be something like this:

An incremental model that inserts the latest sequence into a table (so the max sequence is fetchable and reusable). (A table that has a single column sequence_id, and each time dbt runs, 1 row is added to it, the next sequence).
a view that returns the max sequence number. (This can be skipped if you prefer CTEs in step 3.
join that view to anything that needs the max sequence ID.

Again - not sure of all of your details, but that’s what goes off in my head when reading your response.

Topic		Replies	Views
Populate project variable from query, save static result Help best-practice	3	1910	January 10, 2024
Job parameterization through DBT api Help api	2	1333	November 1, 2024
Set up `dbt run-operation` so that it does not require variables to be passed. Help dbt-core	0	204	August 15, 2024
I want to pass a parameter value to a SQL Dbt model Help incremental , dbt-core	3	125	May 11, 2025
DBT Macro with different parameters Archive	0	4469	February 1, 2022

Can I dynamically add things to project-level state/config in on-operation-start?

Related topics