BigQuery + dbt: Incremental Changes

jerco · March 23, 2020, 8:59pm

Thanks for the detailed question! As I understand it, the reason you wouldn’t want to leverage the insert_overwrite strategy is because

sometimes there’s going to be records from several days ago getting inserted as well as yesterdays data

If you used the dynamic insert_overwrite strategy, you’d risk replacing a full partition of preexisting data with a only handful of new records.

I have two thoughts:

It sounds like you may want something in between simple merge and dynamic insert_overwrite: a strategy that dynamically calculates which partitions to be merging into, and then performs the merge with a unique_key, rather than replacing the entire partition. The code for that is definitely possible, though slower than insert_overwrite. If this is a common problem that folks run into, we may want to add support for it as a new incremental strategy in a future release.
How many late-arriving records do you have? Depending on the data source, we often find that the proportion of late-arriving facts (e.g. web/mobile events) is low enough that we can ignore any records which arrive more than 3 days late during incremental runs, and fold them in during weekly or monthly full refreshes. If you’ve got 10+ TB of source data, that $50+ full-refresh build feels expensive, but paying for it once a week or once a month is far cheaper than trying to manage more complex code to handle 0.001% of your edge cases. Plus, if you’re always running static insert_overwrite jobs for the prior 3 days, you save enough in both time and money to more than make up for the occasional full rebuild.

Topic		Replies	Views
DBT BigQuery Scanning Entire table in merge Help bigquery	1	1929	May 10, 2023
dbt incremental model on BQ Partitioned table Help jinja , incremental , bigquery	1	3602	July 27, 2023
Handling BigQuery Incremental __dbt_tmp tables Help incremental , best-practice , bigquery	2	5987	March 29, 2023
DBT BigQuery table creation code generation issue Help incremental , bigquery	7	2424	April 1, 2023
Test only new data in incremental models Help testing , incremental , bigquery	5	2995	September 11, 2023