Using dbt to manage user defined functions

Hey! Hope you’re doing well.

I just wanted to add something that I think might be helpful to this excellent original idea.

The thing is, the on-run-start hook can add some seconds to each model/test run to apply all the UDFs. In my case, the overall sum of those seconds was around 10-15% of the entire dbt pipeline runtime, and I considered it was too much time for something that is a static thing.

To tackle this, I decided to do the same idea described here, but instead of executing it with the hook, it’s executed in a step of the CI pipeline, in case some file that matches the glob pattern macros/udfs/**/*.sql is modified. You may need to make some adjustments to your infrastructure (your runners), but it worthed a while for my case.

Thanks!

1 Like

Hey, I’ve written an article on udfs as a custom dbt materialization, which allows you to incorporate them into the DAG and so solve most (all?) of the problems I’ve seen mentioned here elegantly.

Writeup here Materialized UDFs in a dbt World - Tempered Works

Example here pypi_vulnerabilities/models/published/udfs/matches_multi_spec.sql at main · brabster/pypi_vulnerabilities · GitHub

2 Likes

Note discussion #10395 on support for UDFs as a materialization

Thanks @claire this was really helpful - I just finished getting it working on databricks.

The only hiccup I came across was that it didn’t want to run multiple create udf statements in one go - So I had to remove the create_udfs macro and instead put each of the individual create udf macros into the on-run-start command in the dbt_project.yml file…

on-run-start:

  • ‘{{create_udf_mask_email_address()}}’
  • ‘{{create_udf_mask_phone_number()}}’

Thank you - this is super useful! I adapted a version from this for Snowflake. One question, I’m curious as why you opted to return an empty list at the end of the custom materialization versus returning the relation of the UDF?

Specifically {{ return({'relations': []}) }} versus {{ return({'relations': [target}) }}. dbt docs are a little opaque, so I’m not sure how this impacts how dbt tracks state changes, dependencies, etc