I have some raw JSON files that get stored in a table in my data warehouse and I use DBT to normalize the JSON files into a proper table. That normalized table is where all my data will live and I don’t want to do any transformations on it to prevent data loss or changes from the raw content.
Since my raw JSON files can be sent to me multiple times (with slightly different values) I want to be able to invalidate some rows if they got reprocessed. I have a _is_ignore
column that I want to set to true
if I receive a new value for the same row. This should allow me to avoid having duplicates while keeping a history of what I received.
The part I’m unable to do or even understand is if it’s possible to run two queries on the same table in DBT.
I have a model that takes the new raw rows with an incremental strategy and loads them into my normalized table. I would like to have another model/job that could do an update after the new rows are loaded, but since both queries are done on the same table, I can’t find a way to achieve what I want.
I know that I could easily create another model that select *
from the normalized table and do the update there, but since it’s a lot of rows, I don’t want to duplicate every row for a simple update query that will only update one boolean column.
I feel like there must be a way to update a table that already has a model that does inserts without copying all the values, but maybe this goes against DBT paradigms.
I appreciate any help.
Jacob