From a best-practices standpoint, what should go in a macro vs a UDF?
First of all, itâs worth asking: can the function be written as a UDF? UDFs are scalar functions â you can only use them where youâd normally write a select
statement. For example you could write a UDF to get the UTM parameters out of a web address, but you canât write a UDF to run a query, hold the results in memory, and then loop through them (youâd have to use a macro for that).
Assuming the thing you want to do can be written as a UDF, the next question is: can the function be written as a macro? There are some functions that need to be UDFs. A good example is a Redshift function that counts the items in a json list â you actually need to write this function in python, and import the json package, and itâs just not possible to do it in Jinja
OK so letâs say you can write something as both a UDF and a Jinja function. Letâs take a f_date_to_string
function as an example. Here it is as a (Redshift) UDF:
create function f_date_to_string (varchar, datetime)
returns varchar
stable
as $$
select
case
when lower($1) in ('year') then to_char($2, 'yyyy')
when lower($1) in ('quarter', 'month') then to_char($2, 'yyyy-mm')
when lower($1) in ('week', 'day') then to_char($2, 'yyyy-mm-dd')
when lower($1) in ('hour', 'minute') then to_char($2, 'yyyy-mm-dd hh24:mi')
end
$$ language sql
In your model, youâd call it like so:
select
f_date_to_string('month', created_at)
from ...
(Pro tip that I havenât included: you should consider putting your UDF in a schema!)
And here it is as a macro:
{% macro date_to_string(date_part, timestamp) %}
case
when lower({{ date_part }}) in ('year') then to_char({{ timestamp }}, 'yyyy')
when lower({{ date_part }}) in ('quarter', 'month') then to_char({{ timestamp }}, 'yyyy-mm')
when lower({{ date_part }}) in ('week', 'day') then to_char({{ timestamp }}, 'yyyy-mm-dd')
when lower({{ date_part }}) in ('hour', 'minute') then to_char({{ timestamp }}, 'yyyy-mm-dd hh24:mi')
end
{% endmacro %}
And how youâd call it:
select
{{ date_to_string("'month'", 'created_at') }}
Generally, the folks I work with are pro-macros: the macro syntax is a little easier to learn â they look more like python and feel a bit more intuitive than the at-times weird syntax of UDF.
Personally, I lean a little more on the side of being pro-UDFs. Yes, the syntax is a little harder, but I think they are easier when it comes to actually writing your SQL â with macros, you have to make sure you get all âyourâ âargumentâ ââquotesââ right (which I rarely do first time).
Try both and see what you like! If you go down the UDF approach, here is an article I wrote a while back about integrating them into your dbt project.
In many databases, UDFs are only scalar. However, Snowflake does have the capability to do UDTFs (User-Defined Table Functions) which can return a table per se.
https://docs.snowflake.net/manuals/sql-reference/user-defined-functions.html
Also, from a database performance perspective UDFs and UDTFs may out-perform macros, depending on how they are written, what they are doing, and what the database query optimizer can do with them.
I prefer to make my functions, well, âfunctionalâ, that is, idempotent with no side effects. By the time functions are actually indepdendently reading from and writing to database tables, they are pretty difficult to test and get pretty ugly.