Is there a rule-of-thumb for what to put in a macro vs a user defined function?

From a best-practices standpoint, what should go in a macro vs a UDF?

First of all, it’s worth asking: can the function be written as a UDF? UDFs are scalar functions – you can only use them where you’d normally write a select statement. For example you could write a UDF to get the UTM parameters out of a web address, but you can’t write a UDF to run a query, hold the results in memory, and then loop through them (you’d have to use a macro for that).

Assuming the thing you want to do can be written as a UDF, the next question is: can the function be written as a macro? There are some functions that need to be UDFs. A good example is a Redshift function that counts the items in a json list – you actually need to write this function in python, and import the json package, and it’s just not possible to do it in Jinja

OK so let’s say you can write something as both a UDF and a Jinja function. Let’s take a f_date_to_string function as an example. Here it is as a (Redshift) UDF:

create function f_date_to_string (varchar, datetime)
returns varchar
 stable
as $$
  select
  case
    when lower($1) in ('year') then to_char($2, 'yyyy')
    when lower($1) in ('quarter', 'month') then to_char($2, 'yyyy-mm')
    when lower($1) in ('week', 'day') then to_char($2, 'yyyy-mm-dd')
    when lower($1) in ('hour', 'minute') then to_char($2, 'yyyy-mm-dd hh24:mi')
  end
$$ language sql

In your model, you’d call it like so:

select
  f_date_to_string('month', created_at)
from ...

(Pro tip that I haven’t included: you should consider putting your UDF in a schema!)

And here it is as a macro:

{% macro date_to_string(date_part, timestamp) %}
  case
    when lower({{ date_part }}) in ('year') then to_char({{ timestamp }}, 'yyyy')
    when lower({{ date_part }}) in ('quarter', 'month') then to_char({{ timestamp }}, 'yyyy-mm')
    when lower({{ date_part }}) in ('week', 'day') then to_char({{ timestamp }}, 'yyyy-mm-dd')
    when lower({{ date_part }}) in ('hour', 'minute') then to_char({{ timestamp }}, 'yyyy-mm-dd hh24:mi')
  end
{% endmacro %}

And how you’d call it:

select
  {{ date_to_string("'month'", 'created_at') }}

Generally, the folks I work with are pro-macros: the macro syntax is a little easier to learn – they look more like python and feel a bit more intuitive than the at-times weird syntax of UDF.

Personally, I lean a little more on the side of being pro-UDFs. Yes, the syntax is a little harder, but I think they are easier when it comes to actually writing your SQL – with macros, you have to make sure you get all ‘your’ “argument” “‘quotes’” right (which I rarely do first time).

Try both and see what you like! If you go down the UDF approach, here is an article I wrote a while back about integrating them into your dbt project.

1 Like

In many databases, UDFs are only scalar. However, Snowflake does have the capability to do UDTFs (User-Defined Table Functions) which can return a table per se.

https://docs.snowflake.net/manuals/sql-reference/user-defined-functions.html

Also, from a database performance perspective UDFs and UDTFs may out-perform macros, depending on how they are written, what they are doing, and what the database query optimizer can do with them.

I prefer to make my functions, well, “functional”, that is, idempotent with no side effects. By the time functions are actually indepdendently reading from and writing to database tables, they are pretty difficult to test and get pretty ugly.

2 Likes