The problem I’m having
It seems that my dbt-clickhouse is not truly fully incremental, for instance。Among them, “max_event_time” is from the data source table xxxxxx. First, I will filter out some data source logs based on certain conditions, such as determining whether sip is a terminal. However, this will result in event_time not fetching the maximum value each time.
In extreme cases, the inner join will filter out all data, resulting in all logs being read each time the dbt is run instead of incremental reads.
One solution is to use left join to connect the data source without performing any filtering operations, leaving other fields empty or with a default value. In this way, we can ensure that the latest event_time is obtained each time. Does anyone have any other good methods?
{{
config(
materialized='incremental',
order_by='event_date',
incremental_strategy='append'
)
}}
with
source as (
select
s.event_time as event_time,
s.event_time as visit_time,
toDate(s.event_time) as event_date,
s.sip as sip,
s.dip as dip,
s.dport as dport
from
{{ ref('xxxxxxxx') }} s
inner join xxxx on xxxx
{% if is_incremental() %}
where s.event_time > (select max(max_event_time) from {{ this }})
{% endif %}
),