Parent/child nodes ARE modified (whether on purpose or not) but have been excluded (via --exclude
)
This is another scenario that may trip folks up as you have to not only understand “state” and the “plus” (+) graph operator but also it’s interactions with node exclusions at the same time.
Let’s setup a toy project like so (copied from the previous post):
# ~/.dbt/profiles.yml
snowflake:
target: prod
outputs:
prod: &sf-creds
type: snowflake
...
ci: *sf-creds
# dbt_project.yml
name: my_dbt_project
profile: snowflake
config-version: 2
version: 1.0
models:
my_dbt_project:
+materialized: table
And some models like:
-- models/foo.sql
select 1 as id
-- models/bar.sql
select 1 as id
-- models/staging/stg.sql
{{ config(materialized='incremental', unique_key='id') }}
select * from {{ ref('foo') }}
We can assume for a second that models in the staging
folder specifically - we don’t want them to be run during Slim CI jobs because they perhaps contain or process a lot of data (of course this is not the case with this toy example but just imagine it
) - we want to manually test them. What does this mean? This means that you most likely will be running CI jobs with a command that’s similar to:
dbt run --select ... --exclude staging
Replace ...
above with some variation of state:modifed
.
Okay, now let’s do our first “production” run.
$ dbt run --full-refresh
01:12:06 Running with dbt=1.4.5
01:12:07 Found 3 models, 0 tests, 0 snapshots, 0 analyses, 308 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
01:12:07
01:12:13 Concurrency: 1 threads (target='default')
01:12:13
01:12:13 1 of 3 START sql table model dbt_jyeo.bar ...................................... [RUN]
01:12:17 1 of 3 OK created sql table model dbt_jyeo.bar ................................. [SUCCESS 1 in 3.64s]
01:12:17 2 of 3 START sql table model dbt_jyeo.foo ...................................... [RUN]
01:12:20 2 of 3 OK created sql table model dbt_jyeo.foo ................................. [SUCCESS 1 in 3.52s]
01:12:20 3 of 3 START sql incremental model dbt_jyeo.stg ................................ [RUN]
01:12:24 3 of 3 OK created sql incremental model dbt_jyeo.stg ........................... [SUCCESS 1 in 4.00s]
01:12:24
01:12:24 Finished running 2 table models, 1 incremental model in 0 hours 0 minutes and 16.96 seconds (16.96s).
01:12:24
01:12:24 Completed successfully
01:12:24
01:12:24 Done. PASS=3 WARN=0 ERROR=0 SKIP=0 TOTAL=3
Nothing surprising here - all 3 models are run as expected. Now let’s move our target
folder as we did previously so we can defer to the manifest.json
that was generated above.
$ mv target target_old
And then let’s make two changes.
(1) Let’s modify model bar
:
-- models/bar.sql
select 2 as id
(2) Let’s modify our dbt_project.yml
like so:
# dbt_project.yml
name: my_dbt_project
profile: snowflake
config-version: 2
version: 1.0
models:
my_dbt_project:
+materialized: table
staging:
+incremental_strategy: "delete+insert"
# ^ These last 2 lines are newly added to the file ^ #
Note, in very large projects with many models - you MAY not even know that by adding those 2 lines - you have inadvertently caused the incremental model stg
to be “modified” since model configs can be set in many places (the model’s own config()
block, in the dbt_project.yml
file - like we did here, or even it’s property yml file).
Okay, now let’s do a CI run:
dbt run -s +state:modified+ --exclude models/staging --defer --state target_old
This command says to:
- Build anything that’s modified.
- Build anything that’s modified and all it’s child / downstream nodes.
- Build anything that’s modified and all it’s parent / upstream nodes.
- Do not build anything that is excluded - which is simply the model
stg
.
Let’s see what happens:
$ dbt run -s +state:modified+ --exclude models/staging --defer --state target_old
01:13:37 Running with dbt=1.4.5
01:13:39 Found 3 models, 0 tests, 0 snapshots, 0 analyses, 308 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
01:13:39
01:13:45 Concurrency: 1 threads (target='default')
01:13:45
01:13:45 1 of 2 START sql table model dbt_jyeo.bar ...................................... [RUN]
01:13:48 1 of 2 OK created sql table model dbt_jyeo.bar ................................. [SUCCESS 1 in 3.67s]
01:13:48 2 of 2 START sql table model dbt_jyeo.foo ...................................... [RUN]
01:13:52 2 of 2 OK created sql table model dbt_jyeo.foo ................................. [SUCCESS 1 in 3.25s]
01:13:52
01:13:52 Finished running 2 table models in 0 hours 0 minutes and 12.75 seconds (12.75s).
01:13:52
01:13:52 Completed successfully
01:13:52
01:13:52 Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2
As we can see above - we didn’t modify foo
but foo
was included in the +state:modified+
selection. Why is that? Because we modified it’s child stg
’s configuration (by adding an incremental_strategy
config to it where one wasn’t there before). Thus stg
is modified and upstream of stg
is foo
.
The exclusion --exclude models/staging
simply means “exclude the node itself from running” - and that’s all it means and not more - as in - it does not mean “exclude the node itself from running plus any parent/child nodes from/of it”.