dbt build on fresh data

andrewboyle-dbt · November 9, 2023, 4:30pm

The problem I’m having

Looking to create more of an event based build to reduce wasteful runs on data. I came across this functionality

Command step order

dbt source freshness
dbt build --select “source_status:fresher+”

but when I run dbt build --select “source_status:fresher+” I’m met with an error :
" No previous state comparison freshness results in sources.json"

From my understanding, each run of ‘dbt source freshness’ overwrites sources.json and so it has no other copy to compare itself to.

Looking to get some help on either implementing this section below since I feel this is what I’m missing.

You can also set the dbt_ARTIFACT_STATE_PATH environment variable instead of the --state flag.

dbt source freshness # must be run again to compare current to previous state
dbt build --select “source_status:fresher+” --state path/to/prod/artifacts

The context of why I’m trying to do this

By running this on a 15 minute interval for example, we can consistently check for fresh data sources and only run those models.

What I’ve already tried

I’ve tried pointing to an older version of sources.json for it to compare itself against but the same error returns.

Some example code or error messages

Internal Error
  No previous state comparison freshness results in sources.json

andrewboyle-dbt · November 9, 2023, 4:48pm

I’ve figured it out, I was using the wrong syntax when specifying the state.
dbt build --select source_status:fresher+ --state ./folder-old-sources-file-is-in is the correct format.

Next step will likely just be to override the default sources path by using
dbt source freshness -o ./path

maria · March 12, 2024, 4:30am

Hi @andrewboyle-dbt,

Could you please advise what exact commands you run in your dbt job? I have got to the same point as you had and still cannot make it work

maria · March 20, 2024, 5:45am

OK,
I can sort of make it work (running commands manually), but unclear how to make it work in the dbt cloud job.

I need to have both current and previous states for a job to run correctly.

step1: dbt source freshness # check for current state
step 2: dbt build --select source_status:fresher+ --state ./folder-old-state
# run against previous state
step 3: here I need to copy current state to the old state folder and I cannot do it in the dbt cloud. Running dbt source freshness -o ./folder-old-state is not good enough as there could be state changes between step 1 and step 3

Does anyone have any suggestions?

andrewboyle-dbt · March 20, 2024, 3:17pm

hey Maria, it’s been a while since I’ve looked into this but this is what worked, keep in mind this is running from a KubernetesPodOperator using dbt CLI, not cloud.

I’m not sure if cloud allows this but one way would be to use an aws s3 bucket to host your files for manifests, sources, freshness tests, etc. and then use aws s3 sync to copy files, this way in each dag run you can copy a previous file, run a dbt freshness, store that new freshness somewhere for subsequent runs and then dbt build if the sources are fresher

maria · March 21, 2024, 6:08am

Thank you for your response!

I have asked this question in dbt slack and apparently dbt jobs now have a new feature in the advanced settings (called compare changes against) that can be configured to compare to the previous run.

But what you suggest sounds like will work for dbt-core users too.

Topic		Replies	Views
The selection criterion 'source_status:freshness+' does not match any enabled nodes Help dbt-cloud	0	1178	August 20, 2024
Source freshness - selecting target database Help snowflake , dbt-core	3	1246	March 20, 2023
dbt source freshness at a project level Help	0	674	November 16, 2023
abort dbt job if conditions do not exist Help	10	3643	July 19, 2023
Run only changed models Help	5	12858	February 19, 2025