Incremental Best Practices

The problem I’m having

I recently began learning dbt and am eager to test its capabilities by publishing my first incremental model. It would be greatly beneficial if someone could review my approach to ensure it’s correct. My goal is to avoid developing any bad practices at the start of my learning journey.

I’ve created the following staging model [models/staging/fiannce/stg_charges.sql]

WITH source AS (
    SELECT _airbyte_data->>'id' AS id,
           _airbyte_data->'captured' AS captured,
           _airbyte_data->'paid' AS paid,
           _airbyte_data->>'invoice' AS invoice,
           _airbyte_data->>'payment_intent' AS payment_intent,
           _airbyte_data->'amount_refunded' AS amount_refunded,
           _airbyte_data->>'balance_transaction' AS balance_transaction,
           _airbyte_data->'created' AS created,
           _airbyte_data->>'customer' AS customer,
           _airbyte_data->>'description' AS description,
           _airbyte_data->>'receipt_email' AS receipt_email,
           _airbyte_data->>'amount' AS amount,
           _airbyte_data->>'failure_code' AS failure_code,
           _airbyte_data->>'receipt_number' AS receipt_number,
           _airbyte_data->'refunded' AS refunded,
           _airbyte_data->>'status' AS status,
           _airbyte_data->>'statement_descriptor' AS statement_descriptor,
           _airbyte_data->>'currency' AS currency,
           _airbyte_data->>'failure_message' AS failure_message,
           _airbyte_data->>'payment_method' AS payment_method,
           _airbyte_data->'metadata' AS metadata,
           _airbyte_data->'updated' AS updated

    FROM {{source('finance','_airbyte_raw_original_charges')}}
staged AS (
    SELECT id,
        invoice AS invoice_id,
        payment_intent AS payment_intent_id,
        balance_transaction AS balance_transaction_id,
        to_timestamp(created::double precision) AS created,
        customer AS customer_id,
        amount::numeric/100 AS amount,
        payment_method AS payment_method_id,
        metadata ->> 'channel'::text AS metadata_channel,
        COALESCE((metadata ->> 'adjust-balance'::text)::numeric, 0::numeric) AS metadata_adjust_balance,
        to_timestamp(updated::double precision) AS updated_at,
        row_number() over (partition by id ORDER BY updated DESC) AS duplicates
    FROM source
FROM staged 
WHERE duplicates = 1

Then under marts the final model [models/marts/finance/dim_charges_original.sql]

      materialized = 'incremental',
      unique_key = 'id',
      incremental_strategy = 'merge'
SELECT * FROM {{ ref("stg_charges")}}
{% if is_incremental() %}
  WHERE updated_at > current_date - interval '7 day'
{% endif %}

Context about the data

This is transactional data from stripe with ~100k rows, it already incorporates multiple duplicates in the raw source. The data is pushed in through Airbyte.


  • Am I correctly implementing the ‘incremental strategy’ in dbt?
  • In my ‘stg_charges.sql’ file, I’ve formatted the data types and renamed columns to their appropriate attribute names. To eliminate duplicates, I’m using the row_number() function. Is there a more efficient method for achieving ‘deduplication’ in dbt?
  • Is it necessary to always rebuild the entire query in ‘stg_charges.sql’, or is there an approach to only filter for the latest data not yet included in the final ‘dim_charges_original.sql’ model, or perhaps limit the data to a specific timeframe? This could improve the performance of the staging queries.
1 Like

Code looks good.

One of the best practice is to add spaces {{ ref(“stg_charges”) }} for flower braces. You have implemented the model correctly.

Could include (current_date - 7 interval) in brackets in where clause.

Using analytical functions (row number and rank) is good for eliminating duplicates. Or else u have to use a subquery that would increase the time to process the code.


This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.