Hey there, I have a Redshift Spectrum table that contains a bunch of JSON data. One column has JSON like this:
"user_identities":[
{
"identity_type":"email",
"identity":"bob@bob.com",
"timestamp_unixtime_ms":1540483995196
},
{
"identity_type":"customer_id",
"identity":"12345",
"timestamp_unixtime_ms":1540483995196
}
]
}
So far as I can tell, in Redshift Spectrum, I can only access the data in this object by declaring the full path, which means I can’t just snapshot the Spectrum table. With this in mind, I have a model that creates a table like this:
| email | customer_id | updated_at |
|-------------|-------------|---------------------|
| bob@bob.com | 12345 | 2019-07-06 21:41:10 |
What I’m wondering is if there are any downsides to using this model as a snapshot source. The potential issue I can see is if the dbt runs overlap, but I think I can mostly mitigate that. Any other potential issues?