How to get incremental number by using surrogate key ?

Hi dbt gurus,

I am using {{ dbt_utils.generate_surrogate_key(['field_a', 'field_b', ...]) }} to generate a primary key. Instead of a hash, can I generate a surrogate key with sequential/incremental numbers?

In my opinion never use sequential numbers. nowadays always use a hash

Note: @david.dumas originally posted this reply in Slack. It might not have transferred perfectly.

If your db supports AI, then use that, the one that I am using doesn’t, so what I am doing is:

  1. Set a variable of the last SK in the target table at the beginning of my model

    {%- set last_sk = get_last_sk(this, 'my_sk') -%}
    
  2. Increment that value in the select

     coalesce(case
                 when {{this}}.my_sk is not null then {{this}}.my_sk -- this line will retain your SK for already existing records
                 else {{last_sk}} + (sum(case when {{this}}.my_sk is null then 1 else 0 end) over (order by your_business_keyS))::Int32
               end
     ,-1)::Int32 as my_sk
    

    The reason for using sum() window function instead of row_number() is that row_number() cannot use case statement for business keys existing in the dim, thus it will create gaps in the newly generated keys as big as the table count. sum(case when {{this}}.my_sk is null then 1 else 0 end)will keep the accumulated SK value until a non existing business key appears for which it will add +1.

  3. Last you need to have a left join to the target table

    left join {{ this }} tgt on (tgt.business_key = src.business_key)
    

The big drawback with this approach (along with being ugly) is that it does not support paralel runs as for two models that reached the get_last_sk() step at the same time, will get the same last_sk value from the target table leading to colliding SKs for the new records.

The code for the get_last_sk macro is:

{% macro get_last_sk(table_name,sk_column_name) %}

{{ log("Retrieving last surrogate key from: " ~ table_name ~ "." ~ sk_column_name) }}	

{% set last_sk_query %}
	select coalesce(max({{sk_column_name}}),0) as sk from {{table_name}}
{% endset %}

{% set results = run_query(last_sk_query) %}

{% if execute %}
	{# return the first value of the first column #}
	{% set result_value = results.columns[0][0] %}
{% else %}
	{% set result_value = 0 %}
{% endif %}

{{ log("Last SK for " ~ table_name ~ " : " ~ result_value) }}

{{ return(result_value) }}

{% endmacro %}

Hope that helps.

Another option:

select (seq8()*1000000000000000 + to_number(to_char(current_timestamp,‘yyyymmddhh24miss’))) as table_pk,
col1, col2, etc…

That uses the timestamp as the base of the key and then increments in front of the timestamp, and therefore will produce all unique values that are different every time the select statement is run. It’s a large number, but it is dead simple and works great in lieu of using a sequence generator.