Snapshots are not being built in a CI schema during SlimCI -- PRODUCTION affected during CI run!

I have snapshots that are built a separate Snowflake database, which is defined by an environment variable.

{% snapshot snap_qb_etahub__assemblies %}
          target_database=env_var('DBT_DB_SNAPSHOTS', 'UNDEFINED__DBT_DB_SNAPSHOTS'),

      CURRENT_TIMESTAMP()::TIMESTAMP AS dbt_snapshot_at
      {{ source('qb_etahub', 'assemblies') }}
{% endsnapshot %}

I recently added a new snapshot, pushed a PR, then decided that I needed to change the unique_key. After adding a new commit, I started getting errors. That’s when I realized that dbt was not building this new snapshot in a separate CI-only schema.

This seems really bad because I might make snapshot-breaking changes during CI, which would end up affecting production snapshot data. Am I doing something wrong here? Are there configuration changes I can make that would tell SlimCI to build my snapshots in a separate schema?

1 Like

Yikes! I’m pretty sure that this is correct expected behaviour out of the box, even if it’s not friendly behaviour.

Some options:

  1. Check out Using dynamic schemas for snapshots, especially the first option where you override the schema generation macro
  2. Could you pass the schema name as an env var too?
  3. Might be worth opening an issue about this! I think it’s old behaviour, that probably hasn’t been reviewed in a more CI-heavy world

I’d argue that it’s incorrect for a CI process to alter production, even if it’s snapshots themselves are behaving as expected according to the design.

1 - I saw that, but I also saw this comment, which indicated that SlimCI treats all snapshots as modified during every CI job, which would mean my entire dag would need to be built every run CI run. This would also snapshot every table in production on every CI run, which could be fraught with side effects.
2 - I thought about that, but it would start out empty and thus all CI jobs using snapshot models would fail, unless I manually maintained/synced the CI schema. :thinking: … I suppose I could set up a dbt operation to zero-copy-clone my prod snapshots and have that run as part of my prod build job. All CI jobs would use the same schema (to avoid the dynamic problem), so in principle there could be conflicts if multiple CI jobs were running simultaneously; but I don’t expect that would be an issue for me. I might try this approach…
3 - Honestly, snapshots feel like they were left behind in the run up to 1.0. I still get annoyed when I have to write them and re-learn that many config.yml options are ignored and I have to put them in a jinja config block. It almost feels like they should just be normal models that use a dbt-supported snapshot macro with some guidance around when it’s appropriate to use them (e.g., save them in a permanent location that can be archived and not in a transient tables).

Yes that’s much better - edited for clarity :pray:

Would love to hear how this goes!

I feel you :grimacing: gonna write something up about this when I get back to my computer properly

Oh no! @joellabes , #2 doesn’t really work. My CI job is now running the entire dag. I guess it sees the difference in the snapshot database as a difference with the production run.

Not sure what to do at this point without abandoning SlimCI (BIG driver for using dbt-cloud) or putting my snapshots at risk (can’t do this for long).

Hmmm OK I’m going to ask around and see if I can find any other suggestions for you. To double check, this would all work fine (as far as you know) if you could define the schema in a yaml file outside of the config block, right?

In honesty, I never put my snapshots through CI at my last job - I would just test them manually by making a copy of the table (something like create table snapshots.my_snapshot_backup as (select * from snapshots.my_snapshot)) and then try running the snapshot normally safe in the knowledge that I could drop it if something went wrong.

That was an objectively bad strategy :grimacing:

Nevermind, this strategy does mostly work! I realized the problem here. I was defining the snapshot database in the jinja config block. After moving it to dbt_project.yml, it works as expected. According to the state comparison caveats, dbt ignores environment changes if they’re defined in the dbt_project.yml. Fortunately, snapshots honor this yaml variable.

I develop snapshots the same way you do. The problem I was trying to solve is that if I submitted a commit that had a change to a snapshot, then SlimCI would run that snapshot (because we use dbt build) in my production snapshot location. This is the (dangerous) default behavior. … I suppose you could also exclude snapshots from the SlimCI job, but you’d have to explicitly do so.

1 Like

Hurrah! So let me repeat this back and check I’ve got it (I’ve turned this into a Wiki post, so you can edit it directly if I’ve got anything wrong, then I’ll mark it as the solution and use it as the bones of a GH issue)

  • By default, dbt snapshots build into a hardcoded schema. Making the schema dynamic means that every snapshot is considered state:modified on every CI run because in-file environmentally-aware configs trigger false positives and target_schema can’t be defined in dbt_project.yml.
  • You can make a zero-copy clone of the snapshot to a different database at the start of a CI run. target_database can be defined using env_var jinja in the snapshots section of dbt_project.yml which means other unchanged snapshots don’t get picked up as modified unnecessarily.
  • Cloning to a different database as described in the second bullet and then setting the snapshot database environment variable in your SlimCI job enables snapshots to be Slim CI’d. If you can’t do that, you’d need to exclude them from the CI job definition. That is tricky in dbt build context because the dbt build --resource-type filter seems to only takes a single resource type anyway, and it’s not able to be used for exclusion)

@gnilrets I just started writing this up as a GH issue, and then I found a couple of related things (this issue comment and this issue specifically) which made me realise I had misunderstood a couple of fundamental things in each of my bullets.

This isn’t true - both target_schema and target_database can be defined indbt_project.yml along with most snapshot behaviour. It just doesn’t work in whatever.yml property files (you said this above and I just blanked on it).

:point_up: true, but maybe unnecessary, because this:

is resolved by this:

And if you’re not using an env var to define the location of the model, shouldn’t standard deferral prevent the failures described here:

? I haven’t tested how deferral works with snapshots, but I’d be mad to discover it doesn’t work the same as any other ref.

So I think the main thing to do here is to go and +1 Snapshot config doesn't work in schema.yml · Issue #4000 · dbt-labs/dbt-core · GitHub, and I’ll open a different issue saying that dbt’s default schema logic for snapshots is suboptimal, but purely because it creates a bad default experience, not because it makes CI impossible as first believed.

Does that sound right to you? I thought I was so clever getting a communal wiki post up :grimacing:

Issue #4000 is a good one to solve, but I’m not clear on whether having all of my config in a .yml file would have helped.

The root issue is that when I set up SlimCI in dbt-cloud, it was not creating CI-specific schemas when a snapshot model was changed and instead ran changed snapshot code in the production schemas.

I put all of my snapshots in a separate database (defined by an environment variable), and I specify the schemas in config blocks (only because I’ve previously discovered that some options are not honored in a snapshot .yml file). If my schema was defined in a .yml file, would that allow SlimCI to create CI-specific snapshots in the same way that it works for regular model schemas? If so, then solving #4000 would address this, but it’s not clear from the issues whether that is the case.