I have a users model that currently run with a full-refresh and i want to move it to be incremental with incremental_strategy=merge and run every hour.
The model has columns calculated with first_value and last_value functions, for example: first_time_login and last_time_login.
i know last_time_login column will still work properly but i have problem with first_time_login- i don’t want to override it in each run. to solve it, I found out i can use merge_exclude_columns.
But then i encountered another problem - the first_time_login might be NULL at initially which means it will never be updated again.
example: a user was created yesterday but made his first login only today. if i run hourly - the first_time_login will be set to NULL and will never be updated because of merge_exclude_columns.
My question: Can i use merge_exclude_columns but excluding NULLS? meaning i want to: 1. get the first value which is not null 2. once i have updated value - never update it again
Why not? It should stay static, shouldn’t it? It’ll be null until they first log in, and then on every subsequent invocation it will stay as the first time they logged in. So it seems like it shouldn’t really matter if it gets overwritten to the same value or if it’s ignored.
As i understand, if i include a column (say first_time_login) in merge_exclude_columns - the value there will be the value that i get from a full-refresh and will never change again.
So if i run a full-refresh and a user never logged in yet - i will get NULL. this NULL will never change.
I don’t want it - i to have a value of the first time the user logged in not necessarily from a full-refresh run
I found anohter solution rather using merge_exclude_columns for my use case.
I bring current table values with
self_columns AS
(
SELECT user_id,
first_time_login,
last_time_login
....
FROM {{ this }}
),
the final select for those columns looks like:
SELECT ...,
COALESCE(self_columns.first_time_login, main.first_time_login) AS first_time_login,
IFF(main.last_time_login IS NOT NULL, main.last_time_login, self_columns.last_time_login) AS last_time_login,
FROM main m
LEFT JOIN self_columns s ON m.user_id = s.user_id