dbt microbatch strategy generating duplicates

The problem I’m having

I am trying to set up a microbatch incremental strategy with a simple model, but I am unsure why duplicate rows are being generated despite configuring unique_keys.

Here is the sample seed data:

_id,event_time,value
1,2025-08-04 09:00:00,A
2,2025-08-04 10:00:00,B
3,2025-08-05 08:00:00,C
4,2025-08-05 09:00:00,D
5,2025-08-06 07:30:00,E
6,2025-08-06 08:00:00,F

Here is the model logic.

{{
    config(
        materialized='incremental',
        incremental_strategy='microbatch',
        event_time='event_time',
        begin='2025-08-04',
        batch_size='day',
        unique_keys='_id',
        concurrent_batches=false,
        lookback=2
    )
}}

WITH source_data AS (
    SELECT
        _id,
        event_time,
        value
    FROM
        {{ ref('sample_microbatch_seed') }}
),

deduplicated_source AS (
    SELECT
        *,
        ROW_NUMBER() OVER (PARTITION BY _id ORDER BY event_time DESC) as row_num
    FROM source_data
)

SELECT
    _id,
    event_time,
    value
FROM deduplicated_source
WHERE row_num = 1

What happens is that I run the dbt run –select my_model_name for the first time, it will basically run a full refresh of the entire data for how many number of batches (in this case, 3 times), resulting in 6 * 3 = 18 rows.

So it would look something like this

_id,event_time,value
1,2025-08-04 09:00:00,A
1,2025-08-04 09:00:00,A
1,2025-08-04 09:00:00,A
2,2025-08-04 10:00:00,B
2,2025-08-04 10:00:00,B
2,2025-08-04 10:00:00,B

6,2025-08-06 08:00:00,F
6,2025-08-06 08:00:00,F
6,2025-08-06 08:00:00,F

What I have tried

I have tried setting full_refresh=false to no avail. I also tried to use something like dbt run --select sample_microbatch_model --event-time-start ‘2025-08-04’ --event-time-end ‘2025-08-05’ but while that will result in only one batch being run, dbt will still add the rows with event-time outside of the range I specified anyways.

Clearly there is something wrong with how I am setting up this model but I am unsure what else do I need to do.

Hey, have you also defined ‘event_time’ in the source definition of sample_microbatch_seed? If you skip it, dbt will perform a full refresh every time on this source.

sources:

- name: x
  ...
  tables:
  - name: raw_x
    config:
        event_time: 'snapshot_date'   # needed for incremental microbatch models!

Hello, yes this solution worked! I know the documentation mentioned that all upstream models should have event_time configured but I didn’t realize it also applied to the source itself as well. Thanks very much!

1 Like