Best Practices for Robust, Idempotent Incremental Models (Beyond Timestamps)

binhln · September 5, 2025, 10:28am

Hello dbt Community!

I’m working on designing a near real-time pipeline and want to ensure our dbt incremental models are as robust as possible.

My background is in stream processing with tools like Spark Streaming, where you rely on stateful checkpoints to guarantee exactly-once processing. When looking at dbt’s incremental models, the most common pattern I see is filtering based on a max(updated_at) timestamp.

My concern with this approach is its potential fragility:

Failure Recovery & Missed Windows: If my orchestrator (e.g., Airflow) runs a dbt job every 5 minutes and the 10:05 AM run fails, the next successful run at 10:10 AM will filter for timestamp > max(timestamp) from before the 10:05 run. This creates a 5-minute gap where data might be skipped. How does the community architect its jobs to be resilient to these orchestrator failures and prevent data gaps?
Duplicate Data with an append Strategy: My second concern is about re-runs. If I need to manually re-run a job for a specific historical period, a simple timestamp filter will re-select all the data from that period. If the model is using incremental_strategy = ‘append’ (which is common for immutable event logs or tables without a reliable primary key), this will create duplicates. What are the best practices for safely re-running append-only incremental models?

Beyond using a timestamp, what are the community’s recommended patterns for building truly idempotent and reliable incremental models?

I’m particularly interested in:

The pros and cons of relying solely on a unique_key with a MERGE strategy. How do you handle performance on very large source tables?
Patterns for using CDC data with a Log Sequence Number (LSN) or version number instead of a timestamp.
Are there any established best practices for a “manual checkpoint” system, for example, using a control table to manage state?

Thanks for sharing your insights!

Topic		Replies	Views
Incremental Best Practices Help incremental , best-practice , dbt-cloud	2	3783	January 15, 2024
On the limits of incrementality In-Depth Discussions incremental , best-practice	17	71278	November 2, 2023
dbt microbatch strategy generating duplicates Help incremental , redshift , dbt-core	2	49	August 7, 2025
How much incrementalism should I use in my project? In-Depth Discussions incremental	8	2440	November 2, 2022
Am I using incremental table correctly? Help incremental , bigquery	6	2899	May 14, 2024

Best Practices for Robust, Idempotent Incremental Models (Beyond Timestamps)

Related topics