Models as source for snapshots

Hey there, I have a Redshift Spectrum table that contains a bunch of JSON data. One column has JSON like this:

   "user_identities":[  
      {  
         "identity_type":"email",
         "identity":"bob@bob.com",
         "timestamp_unixtime_ms":1540483995196
      },
      {  
         "identity_type":"customer_id",
         "identity":"12345",
         "timestamp_unixtime_ms":1540483995196
      }
   ]
}

So far as I can tell, in Redshift Spectrum, I can only access the data in this object by declaring the full path, which means I can’t just snapshot the Spectrum table. With this in mind, I have a model that creates a table like this:

| email       | customer_id | updated_at          |
|-------------|-------------|---------------------|
| bob@bob.com | 12345       | 2019-07-06 21:41:10 |

What I’m wondering is if there are any downsides to using this model as a snapshot source. The potential issue I can see is if the dbt runs overlap, but I think I can mostly mitigate that. Any other potential issues?

Hey @jesse - one of the cool things about Snapshots is that you can now specify a query to snapshot. I think it was a pretty common workaround to “archive” a model, but that definitely leads to all sorts of quirks in dbt runs. You’d need to run some of your models, then your archive, then the rest of your models!

Instead, I think the move is definitely to implement the select logic that you’re outlining here in the body of your snapshot block. Does it sound like that would fit your needs?

1 Like

Sounds like it should work perfectly! Thanks again for your help.