Our dbt project uses some user defined functions (UDFs).
In the past, I’ve created these UDFs in my SQL console as the superuser, however this isn’t ideal as:
- The code was not surfaced as part of our dbt project, so it’s unclear what the UDF does
- There was no version control on the UDFs if we needed to update them
- And we weren’t maintaining separate development/production versions of the UDFs .
So I decided to put them into our dbt project, using this process:
- Created a subdirectory, just to keep things tidy:
macros/udfs/
- Created a file for each udf, e.g.
macros/udfs/f_future_date.sql
(below). Note that each of my UDFs is prefixed withf_
, as recommended in the Redshift documentation.
{% macro create_f_future_date() %}
CREATE OR REPLACE FUNCTION {{target.schema}}.f_future_date()
RETURNS TIMESTAMP
IMMUTABLE AS $$
SELECT '2100-01-01'::TIMESTAMP;
$$ LANGUAGE sql
{% endmacro %}
- Created a macro called
macros/create_udfs.sql
which calls each UDF macro. Note separation with;s
. Also, note the inclusion of acreate schema
statement – since the target schema is created afteron-run-start
hooks, and I’ve used the target schema in my UDFs, I have to create that schema if it doesn’t exist.
{% macro create_udfs() %}
create schema if not exists {{target.schema}};
{{create_f_list_custom_keys()}};
{{create_f_count_custom_keys()}};
{{create_f_future_date()}}
{% endmacro %}
- Added an
on-run-start
hook to my project
on-run-start:
- '{{create_udfs()}}'
- Updated the references to UDFs in my models to use the schema-versions, e.g.
SELECT
{{target.schema}}.F_FUTURE_DATE()
Other quirks I ran into when doing this were:
- As the superuser, I had to grant permission to users to create python udfs:
-- as your superuser
GRANT USAGE ON LANGUAGE PLPYTHONU TO userA;
- I had to use schemas because users cannot edit each others’ UDFs (i.e. if
userA
creates the UDF, anduserB
runs theCREATE OR REPLACE FUNCTION
statement, a “must be owner of function” error will occur). Using schemas means you are creating distinct UDFs* so won’t hit this issue. I would recommend using schemas anyway to maintain separate dev/production UDFs
* Assuming each Redshift user profile used by dbt has a distinct target schema.