Challenges with Incremental Loading in dbt-ClickHouse: Ensuring Proper max_event_time Capture,High-Water Mark

The problem I’m having

It seems that my dbt-clickhouse is not truly fully incremental, for instance。Among them, “max_event_time” is from the data source table xxxxxx. First, I will filter out some data source logs based on certain conditions, such as determining whether sip is a terminal. However, this will result in event_time not fetching the maximum value each time.
In extreme cases, the inner join will filter out all data, resulting in all logs being read each time the dbt is run instead of incremental reads.
One solution is to use left join to connect the data source without performing any filtering operations, leaving other fields empty or with a default value. In this way, we can ensure that the latest event_time is obtained each time. Does anyone have any other good methods?

{{
    config(
        materialized='incremental',
        order_by='event_date',
        incremental_strategy='append'
    )
}}

with

source as (
    select
        s.event_time as event_time,
        s.event_time as visit_time,
        toDate(s.event_time) as event_date,
        s.sip as sip,
        s.dip as dip,
        s.dport as dport
    from
        {{ ref('xxxxxxxx') }} s
    inner join xxxx on xxxx
    {% if is_incremental() %}
        where s.event_time > (select max(max_event_time) from {{ this }})
    {% endif %}
),

Your left join solution is probably best: https://stackoverflow.com/questions/75340890/filtering-for-new-rows-using-dbt-incremental-model

    select
        s.event_time,
        s.sip,
        s.dip,
        s.dport,
        -- Conditionally include business logic
        case when filter_condition.some_field is not null 
             then s.some_business_field 
             else null end as business_field
    from {{ ref('xxxxxxxx') }} s
    left join filter_table on filter_condition
    {% if is_incremental() %}
        where s.event_time > (select max(event_time) from {{ this }})
    {% endif %}
)

select * from source_with_all_timestamps
where business_field is not null  -- Apply filtering after timestamp logic```
But you might also try storing your results in a control table or something fancy. You could then just query for your max timestamp for each table. You can update the rows with a post-hook after each model runs.

I think that you might also be able to use the insert_overwrite incremental strategy in clickhouse but according to the git repo, there might be some considerations because it's "experimental" as far as the current implementation goes: <https://github.com/ClickHouse/dbt-clickhouse?tab=readme-ov-file#the-insert_overwrite-strategy-experimental>

<sub>Note: `@Abigail Green (CHG Healthcare)` originally [posted this reply in Slack](https://getdbt.slack.com/archives/CBSQTAPLG/p1752855292180299?thread_ts=1752716671.423659&cid=CBSQTAPLG). It might not have transferred perfectly.</sub>