Can I dynamically add things to project-level state/config in on-operation-start?

I would like to be able to do something like

{% do config.set('my-custom-param', 'my-custom-value') %}

and later do

{% set mv = config.get('my-custom-param')

somewhere else. I’ve already discovered that you can do this within a particular model, even though it is not a documented feature :slight_smile: , but that only alters what I assume is a local copy of the config.

Why would I want to do this? Because I want to get a “batch-id” from the database in on-operation-start, then use that batch-id to tell the database that the batch has been finished in on-operation-end. I cannot think of how to get that context from one end of the dbt process to the other.

An alternative is to put this information in log messages, then make sure our logs are being saved appropriately (i.e. a solution outside of either dbt or the database). But I like the idea using the database, as it is one less moving part, and will automatically be independent for each dev environment.

I don’t know about the answer to your question directly, but an easier way to get the latest batch ID is to query the database directly.

Check out the run query macro here:

Let me know if that helps you. If not, I’ll try again.

Thanks Jeff, I think you are correct that running it as a separate query is the right way to go. I’ve ended up with a solution that looks like this:

set batch_id=`dbt run-operation start_batch |awk '/batch-id: / {print $NF}'`
dbt run --args  "batch_id: $batch_id"

It would be kind of cool if I could do this without the shell script, but this works and I’m happy with it.

Oh, what you came up with was way different than what I was thinking. I was thinking of an approach that only uses dbt, not an extra shell script. And therefore works in dbt Cloud (or other Cloud based way of running dbt).

From within the model itself you can do the run_query macro, save the value of your next sequence number to a variable, then do stuff with with the variable in the model.

I get what you are saying about using the value within a model, but can I do that across multiple models? I mean, of course I could query the database in every model, but in my case that actually wouldn’t help me, since the batch id is coming from a Snowflake sequence, which gives you a different value each time you call it. I really do need to be able to hold on to the specific value.

However, if you mean saving to a var, so that a subsequent call to var(‘batch_id’) would get the value across the entire run, that would be interesting.

Okay, we are slowly learning more about what you are trying to do. Now I understand it needs to be used across multiple models and that you’re using a snowflake sequence.

Can you post the entire concept of what you’re trying to do? Then I can help propose a solution.

The concept is pretty simple: there is a batch table in the database that keeps track of what batches. One stored procedure opens a batch and returns the new batch id, while another procedure closes the batch when the run is complete.

I am happy with the shell script solution, but as a newcomer to dbt, I’m interested in learning if there are better ways to set it up of course.

thanks

Hi Denise,

The dbt way of thinking would be to eliminate all stored procedures when possible. Sometimes we need custom materializations to replace the sprocs, but often times we don’t.

From what I know about this, the basic “dbt only” workflow could be something like this:

  1. An incremental model that inserts the latest sequence into a table (so the max sequence is fetchable and reusable). (A table that has a single column sequence_id, and each time dbt runs, 1 row is added to it, the next sequence).
  2. a view that returns the max sequence number. (This can be skipped if you prefer CTEs in step 3.
  3. join that view to anything that needs the max sequence ID.

Again - not sure of all of your details, but that’s what goes off in my head when reading your response.