The problem I’m having
I am trying to set up a microbatch incremental strategy with a simple model, but I am unsure why duplicate rows are being generated despite configuring unique_keys.
Here is the sample seed data:
_id,event_time,value
1,2025-08-04 09:00:00,A
2,2025-08-04 10:00:00,B
3,2025-08-05 08:00:00,C
4,2025-08-05 09:00:00,D
5,2025-08-06 07:30:00,E
6,2025-08-06 08:00:00,F
Here is the model logic.
{{
config(
materialized='incremental',
incremental_strategy='microbatch',
event_time='event_time',
begin='2025-08-04',
batch_size='day',
unique_keys='_id',
concurrent_batches=false,
lookback=2
)
}}
WITH source_data AS (
SELECT
_id,
event_time,
value
FROM
{{ ref('sample_microbatch_seed') }}
),
deduplicated_source AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY _id ORDER BY event_time DESC) as row_num
FROM source_data
)
SELECT
_id,
event_time,
value
FROM deduplicated_source
WHERE row_num = 1
What happens is that I run the dbt run –select my_model_name for the first time, it will basically run a full refresh of the entire data for how many number of batches (in this case, 3 times), resulting in 6 * 3 = 18 rows.
So it would look something like this
_id,event_time,value
1,2025-08-04 09:00:00,A
1,2025-08-04 09:00:00,A
1,2025-08-04 09:00:00,A
2,2025-08-04 10:00:00,B
2,2025-08-04 10:00:00,B
2,2025-08-04 10:00:00,B
…
6,2025-08-06 08:00:00,F
6,2025-08-06 08:00:00,F
6,2025-08-06 08:00:00,F
What I have tried
I have tried setting full_refresh=false to no avail. I also tried to use something like dbt run --select sample_microbatch_model --event-time-start ‘2025-08-04’ --event-time-end ‘2025-08-05’ but while that will result in only one batch being run, dbt will still add the rows with event-time outside of the range I specified anyways.
Clearly there is something wrong with how I am setting up this model but I am unsure what else do I need to do.