Occasionally a pipeline will fail on a particular date that will go unnoticed until an analyst finds it. This may result in some massive drop in records for that date, requiring a backfill of the incremental table. May of our Snowflake tables are too massive or the logic too complex to justify a --full-refresh for a single missing date.
Is there some macro or another way to just backfill a specific date or date range?
The context of why I’m trying to do this
Running a full backfill for a single or a couple of missing dates is computationally and financially inefficient.
What I’ve already tried
I looked at the insert_by_period macro, but there doesn’t seem to be any way to pass dynamically to it (and it’s for Redshift).
Hi! We do have a similar use case. What I came up with is a conditional logic in the materialisation config that change the model materialization based on the flag passed in the command line variable. I then wrote a custom materialization with start date and end date variables also passed in the command line. Model has also where clause that is applied when this materialization is used. Running it will select the period specified in the start and end dates variables, delete and reinsert the records into the incremental table. Happy to provide more info later as at the moment I am afk till Tuesday.
@smartinez
in your incremental model if you are filtering source records based on the target table max date then you will not loose the data even though the model fails.
This would not have thrown an error because no table actually failed, just that data was missing from upstream ( for whatever reason). In this case i’d want to just backfill 2023-01-02 and not have to rebuild the entire table.