BigQuery JS UDF (nanoid) returns same value many times only when run via dbt incremental MERGE

jirick · January 29, 2026, 7:23pm

The problem I’m having

I’m using a BigQuery JavaScript UDF (NanoID) to generate unique IDs in dbt models. For ~2 years it worked fine, but recently (starting around Oct 2024) some dbt runs produce the same generated ID repeated across many rows (hundreds/thousands of duplicates). When I run the compiled SQL manually in the BigQuery UI, the IDs look unique the issue seems to happen specifically when dbt executes the model

The context of why I’m trying to do this

I maintain mapping tables (e.g., study_id_mapping, variable_id_mapping) where new entities (DOIs, variable names, etc.) need a stable unique ID. dbt incremental models identify “unmapped” values and insert new (id, value) pairs. Because these tables are large (tens of millions of rows), we generate IDs on insert using a UDF rather than doing it externally.

What I’ve already tried

Updated the UDF to be explicitly NOT DETERMINISTIC.
Updated all dbt models to pass an argument into the UDF (no more zero-arg calls).
Increased NanoID length from 10 → 12 for new IDs.
Confirmed that if I run the compiled SELECT by itself in BigQuery, it appears to generate unique IDs - but dbt’s execution sometimes produces duplicates.

Some example code or error messages

// UDF
CREATE OR REPLACE FUNCTION `<project-id>.udf.generate_nanoid`(value STRING)
RETURNS STRING
NOT DETERMINISTIC
LANGUAGE js
OPTIONS (library=["gs://bigquery-js-ids/id.js"])
AS r"""
  var result = nano.generateId();
  return result;
""";


//study_id_mapping
with
  studies as (select distinct doi from `...p_abstracts`),
  oa_studies as (select distinct doi from `...all_associations`),
  all_normalized_dois as (
    select udf.normalize_doi(doi) as doi from studies
    union distinct
    select udf.normalize_doi(doi) as doi from oa_studies
  ),
  unmapped as (
    select distinct doi
    from all_normalized_dois
    where doi not in (select doi from `common.study_id_mapping`)
  )
select udf.generate_nanoid(doi) as study_id, doi
from unmapped;



// What dbt actually runs (BigQuery job query)
merge into `common.study_id_mapping` as DBT_INTERNAL_DEST
using (
  ...same CTEs...
  select udf.generate_nanoid(doi) as study_id, doi
  from unmapped
) as DBT_INTERNAL_SOURCE
on (FALSE)
when not matched then insert (`study_id`, `doi`) values (`study_id`, `doi`);

What I’m trying to understand
Has anyone seen BigQuery JS UDFs (or NanoID in JS UDFs) behave differently under dbt incremental MERGE execution vs running the compiled SELECT manually? Any known dbt-bigquery adapter behavior (parallelism, connection/job settings, MERGE shape) that could cause the UDF result to be reused unexpectedly?

There are no error messages (except for my dbt test, that checks for uniqueness)

Topic		Replies	Views
Duplicate Records on First run with Incremental Model on BigQuery Archive	3	7710	November 29, 2021
dbt microbatch strategy generating duplicates Help incremental , redshift , dbt-core	2	261	August 7, 2025
incremental merge based on ID Help incremental , snowflake	1	461	August 10, 2024
Using dbt to manage user defined functions Show and Tell udf	25	85546	February 9, 2025
BigQuery + dbt: Incremental Changes Archive	8	41338	June 19, 2020

BigQuery JS UDF (nanoid) returns same value many times only when run via dbt incremental MERGE

The problem I’m having

The context of why I’m trying to do this

What I’ve already tried

Some example code or error messages

Related topics