Is there a way to specify backfill dates for Snowflake?

smartinez · May 26, 2023, 4:29pm

The problem I’m having

Occasionally a pipeline will fail on a particular date that will go unnoticed until an analyst finds it. This may result in some massive drop in records for that date, requiring a backfill of the incremental table. May of our Snowflake tables are too massive or the logic too complex to justify a --full-refresh for a single missing date.

Is there some macro or another way to just backfill a specific date or date range?

The context of why I’m trying to do this

Running a full backfill for a single or a couple of missing dates is computationally and financially inefficient.

What I’ve already tried

I looked at the insert_by_period macro, but there doesn’t seem to be any way to pass dynamically to it (and it’s for Redshift).

Dom · May 27, 2023, 7:29pm

Hi! We do have a similar use case. What I came up with is a conditional logic in the materialisation config that change the model materialization based on the flag passed in the command line variable. I then wrote a custom materialization with start date and end date variables also passed in the command line. Model has also where clause that is applied when this materialization is used. Running it will select the period specified in the start and end dates variables, delete and reinsert the records into the incremental table. Happy to provide more info later as at the moment I am afk till Tuesday.

Surya · May 30, 2023, 4:59am

@smartinez
in your incremental model if you are filtering source records based on the target table max date then you will not loose the data even though the model fails.

plz refer this Incremental models | dbt Developer Hub

smartinez · May 30, 2023, 3:11pm

Yes! that would be very helpful, thank you

smartinez · May 30, 2023, 3:14pm

I don’t quite understand your response.

We have a situation where an upstream table may miss a date or have some reduced number of records. Something like

ds         |  recs
2023-01-01 |  10000
2023-01-02 |  30
2023-01-03 | 110000

This would not have thrown an error because no table actually failed, just that data was missing from upstream ( for whatever reason). In this case i’d want to just backfill 2023-01-02 and not have to rebuild the entire table.

Topic		Replies	Views
insert by period materialization failing on last hour Help incremental	3	1516	October 17, 2022
Running backfills in incremental models, “obsolete records” may persist Help incremental , snowflake	1	3866	April 5, 2023
Best way to backfill table with range of dates? Help	4	2889	February 28, 2024
Incremental model use DB cluster column Help incremental , snowflake	5	4553	September 8, 2022
Incremental model backfill missing data Help redshift , dbt-cloud , dbt-core	3	380	October 24, 2024

Is there a way to specify backfill dates for Snowflake?

The problem I’m having

The context of why I’m trying to do this

What I’ve already tried

Related topics