but when I run dbt build --select “source_status:fresher+” I’m met with an error :
" No previous state comparison freshness results in sources.json"
From my understanding, each run of ‘dbt source freshness’ overwrites sources.json and so it has no other copy to compare itself to.
Looking to get some help on either implementing this section below since I feel this is what I’m missing.
You can also set the dbt_ARTIFACT_STATE_PATH environment variable instead of the --state flag.
dbt source freshness # must be run again to compare current to previous state
dbt build --select “source_status:fresher+” --state path/to/prod/artifacts
The context of why I’m trying to do this
By running this on a 15 minute interval for example, we can consistently check for fresh data sources and only run those models.
What I’ve already tried
I’ve tried pointing to an older version of sources.json for it to compare itself against but the same error returns.
Some example code or error messages
Internal Error
No previous state comparison freshness results in sources.json
I’ve figured it out, I was using the wrong syntax when specifying the state.
dbt build --select source_status:fresher+ --state ./folder-old-sources-file-is-in is the correct format.
Next step will likely just be to override the default sources path by using
dbt source freshness -o ./path
OK,
I can sort of make it work (running commands manually), but unclear how to make it work in the dbt cloud job.
I need to have both current and previous states for a job to run correctly.
step1: dbt source freshness # check for current state step 2: dbt build --select source_status:fresher+ --state ./folder-old-state # run against previous state step 3: here I need to copy current state to the old state folder and I cannot do it in the dbt cloud. Running dbt source freshness -o ./folder-old-state is not good enough as there could be state changes between step 1 and step 3
hey Maria, it’s been a while since I’ve looked into this but this is what worked, keep in mind this is running from a KubernetesPodOperator using dbt CLI, not cloud.
I’m not sure if cloud allows this but one way would be to use an aws s3 bucket to host your files for manifests, sources, freshness tests, etc. and then use aws s3 sync to copy files, this way in each dag run you can copy a previous file, run a dbt freshness, store that new freshness somewhere for subsequent runs and then dbt build if the sources are fresher
I have asked this question in dbt slack and apparently dbt jobs now have a new feature in the advanced settings (called compare changes against) that can be configured to compare to the previous run.
But what you suggest sounds like will work for dbt-core users too.