Hey, I’m trying to implement some of the more advanced regexp commands outlined in these snowflake UDFs:
However, while these functions work fine locally + in production (without dbt), I am unable to transfer them successfully to be used as macros in my dbt project.
The methodology I’m using to port some of this code over to dbt is this one:
Where the author recommends to implement a post hook where you create the UDF:
+post-hook: "{{ create_regexp_subtr2_udf() }}"
… which allows dbt to pick up the function reference correctly.
However, when I come to test the model in which the macro is being used, I encounter the following error in JavaScript which I don’t encounter when running as a normal UDF either locally or in Airflow:
100132 (P0000): JavaScript execution error: Uncaught TypeError: Cannot read properties of null (reading 'search') in REGEXP_SUBSTR2 at
'instr = str.search(regex);' position 24
18:58:00 stackstrace:
18:58:00 REGEXP_SUBSTR2 line: 16
If anyone has any ideas as to why the same code in snowflake is not making the transition to dbt would be a massive help!
Full code
regexp_substr2_js.sql
{% macro regexp_substr2_js() %}
$$
if (OCCURRENCE < 1) OCCURRENCE = 1;
var pos = POSITION - 1;
if (pos < 0) pos = 0;
var str = SUBJECT;
if (pos != 0) {
str = str.substring(pos);
}
var params = "g" + PARAMETERS;
var regex = new RegExp(PATTERN, params)
var instr = 0;
var cursor = pos;
for (i = 1; i <= OCCURRENCE; i++) {
/* Bit that's failing... */
instr = str.search(regex);
/* End. */
if (instr === -1) break;
str = str.substring(instr);
cursor = cursor + instr + 1;
}
if (instr != -1) {
return str.match(regex);
} else {
return "";
}
$$
{% endmacro %}
create_regexp_substr2_udf.sql
{% macro create_regexp_substr2_udf() %}
CREATE OR REPLACE FUNCTION REGEXP_SUBSTR2(SUBJECT string, PATTERN string, POSITION float, OCCURRENCE float, PARAMETERS string, GROUP_NUM float)
RETURNS string
LANGUAGE JAVASCRIPT STRICT IMMUTABLE
AS '{{ regexp_substr2_js() }}';
{% endmacro %}
query.sql
- This is querying a table which contains a column of URL subfolders joined together with
LISTAGG
to create a regex to extract URL subfolders of interest.
WITH
subfolders AS (
select listagg(pattern, '|') as folder
from {{ ref('lookup_urls') }}
)
SELECT
URL,
{{ target.schema }}.regexp_substr2(URL, (select folder from subfolders), 1, 1, 'i')
FROM {{ ref('all_urls' )}}