Hello! I’m reaching out because I’m unsure if I am understanding how to use incremental models correctly. Here is what I am trying to accomplish:
We have an employees table in Google BQ. This table is set up to have a new row added whenever an there is a change to an employee’s record. Essentially, the table is a historical record of employee data. The uniqueness of the rows in the table are a combination of an employee’s id and the timestamp of when the change was captured. The table is also partitioned by day based on the timestamp field.
We’re planning to create an incremental model using this employees table as the source to get the current state of our HR data. For the model, we’re using the using the “merge” strategy and the employee’s id would be unique.
Currently, I’m running into two issues:
- When the “build” or “run” commands worked, I would get duplicate employee ids in our model table
- I’m not certain what I did, but now every time I use the “run” or “build” commands, I get an error that states:
19:11:34 On model.norcal_analytics.stg__employees: /* {"app": "dbt", "dbt_version": "1.7.12", "profile_name": "user", "target_name": "default", "node_id": "model.norcal_analytics.stg__employees"} */
-- back compat for old kwarg name
My questions are:
- Am I completely off base on how to use an incremental model? Like, is this just not a good use case/what incremental models are intended for?
- If this is a good use case for an incremental table, then does anyone have ideas on why I’m getting duplicate ids and why I’m getting this error? I’m thinking the error might be getting raised by the is_incremental macro, but I’m not 100% sure on this.
I’ve read a lot of docs on incremental tables, but I’ve mostly been referencing to this one. Here’s my model file:
{{ config(
materialized='incremental',
unique_key='account_id',
partition_by = {
"field": "last_updated_utc",
"data_type": "timestamp",
"granularity": "day"
}
) }}
with source as (
select * from {{ source('hris','base__employees') }}
)
, final as (
select
id as account_id
, employee_id as local_staff_id
, first_name as staff_first_name
, last_name as staff_last_name
, middle_name as staff_mid_initial
, nickname as nickname
, primary_email as staff_email
, secondary_email as secondary_email
, status as staff_status
, address_line_1 as street_address
, address_line_2 as apartment
, city as city
, state as state
, zip as zip_code
, country as country
, primary_phone as primary_phone
, hired_date as hired_date
, re_hired_date as re_hired_date
, started_date as started_date
, terminated_date as terminated_date
, birth_date as birth_date
, location_cost_center_id as location_cost_center_id
, department_cost_center_id as department_cost_center_id
, supervisor_account_id as supervisor_account_id
, last_updated_utc as last_updated_utc
from source
)
select * from final
{% if is_incremental() %}
where last_updated_utc >= (select max(last_updated_utc) from {{ this }})
{% endif %}
As I’m writing this, I’m wondering about the placement of the is_incremental macro. Should I place this somewhere else like within the “final” cte?
Thanks!!