Use a UDF to do a dynamic rollup by date

joellabes · April 20, 2023, 3:29am

Thanks for sharing that extra context! Super helpful.

From looking at your query, I would consider skipping the UDF altogether and instead use a date spine pattern.

This would look something like this:

--models/date_spine.sql

/*
Use whatever range makes sense for your business.
I did 100 years when I set this up at my last job and 
filtered appropriately at query time
*/

{{ dbt_utils.date_spine(
    datepart="day",
    start_date="cast('2000-01-01' as date)",
    end_date="cast('2099-12-31' as date)"
   )
}}

Then in your model, you fan this out day by day with an inequality join. Note that this also means you can use dbt’s datediff abstraction instead of having to do the Postgres month math yourself (I’ve left the original code commented out):

--models/stocks_and_valuations_as_of_date.sql

WITH
date_spine as (
    SELECT * FROM {{ ref('date_spine') }}
),

stock_certs AS (
    SELECT * FROM {{ ref('stocks') }}
),

cap_tables AS (
    SELECT * FROM {{ ref('cap_tables') }}
),

companies AS (
    SELECT * FROM {{ ref('companies') }}
),

stock_calcs AS (
    SELECT
        date_spine.date_day, 
        stocks.*,
        CASE
            WHEN vesting_period = 0
                THEN num_shares
            WHEN
                {{ dbt.datediff('date_spine.date_day', 'stocks.issued_on', 'month') }} < 12
                --EXTRACT(YEAR FROM AGE(date_spine.date_day, issued_on)) * 12 + EXTRACT(MONTH FROM AGE(date_spine.date_day, issued_on)) < 12
                THEN 0
            ELSE
                (num_shares / (12 * vesting_period)) *
                ( {{ dbt.datediff('date_spine.date_day', 'stocks.issued_on', 'month') }} < 12 )
                --(EXTRACT(YEAR FROM AGE(date_spine.date_day, issued_on)) * 12 + EXTRACT(MONTH FROM AGE(date_spine.date_day, issued_on)))
        END AS vested_shares

    FROM date_spine
    LEFT OUTER JOIN stocks ON date_spine.date_day >= stocks.issued_on
    WHERE date_spine.date_day >= '2021-12-31'
      AND date_spine.date_day < '2030-01-01' -- instead of hardcoding it, it'd be nice to get the furthermost date where vesting actually completes
),

final AS (
    SELECT
        stock_calcs.date_day, 
        companies.company_id,
        SUM(CASE
            WHEN stock_calcs.share_class = 'common'
                THEN stock_calcs.vested_shares
        END) AS vested_shares,
        SUM(CASE
            WHEN stock_calcs.share_class = 'common'
                THEN stock_calcs.num_shares
        END) AS total_common_shares


    FROM stock_calcs
    INNER JOIN cap_tables ON cap_tables.cap_table_id = stock_calcs.cap_table_id
    INNER JOIN companies ON cap_tables.company_id = companies.company_id
    GROUP BY stock_calcs.date_day, companies.company_id
)

SELECT * FROM final

Assuming your input table looks like this:

company_id	employee_id	issued_on	vesting_period	num_shares
123	abc	2022-01-01	4	480
123	def	2022-01-22	4	960
123	ghi	2023-04-12	0	480
456	jkl	2023-02-01	4	480

Then the results of the intermediate stock_calcs CTE would look like:

date_day	company_id	employee_id	issued_on	vesting_period	num_shares	vested_shares
2021-12-31	null	null	null	null	null	null
2022-01-01	123	abc	2022-01-01	4	480	0
…
2022-01-22	123	abc	2022-01-01	4	480	0
2022-01-22	123	def	2022-01-22	4	960	0
…
2023-01-01	123	abc	2022-01-01	4	480	120
2023-01-01	123	def	2022-01-22	4	960	0
…
2023-02-01	123	abc	2022-01-01	4	480	130
2023-02-01	123	def	2022-01-22	4	960	120
2023-02-01	456	jkl	2023-02-01	4	480	0
…
2023-04-12	123	abc	2022-01-01	4	480	150
2023-04-12	123	def	2022-01-22	4	960	140
2023-04-12	123	ghi	2023-04-12	0	480	480
2023-04-12	456	jkl	2023-02-01	4	480	0

Which is pretty overwhelming and gross! But after aggregration in the final CTE, you wind up with a nice wee table with a row per company and day:

date_day	company_id	vested_shares	total_common_shares
2021-12-31	null	null	null
2022-01-01	123	0	480
…
2022-01-22	123	0	1440
…
2023-01-01	123	120	1440
…
2023-02-01	123	250	1440
2023-02-01	456	0	480
…
2023-04-12	123	770	1920
2023-04-12	456	0	480

From there, you can also make your second model which is just the state of play on the first of the month. Something like:

--models/stocks_and_valuations_as_of_month
select *
from {{ ref('stocks_and_valuations_as_of_date') }}
where extract(day from date_day) = 1 --I hope this is right! However you get the first of the month in Postgres

And then your production code will just query the resulting table:

select * 
from analytics.stocks_and_valuations_as_of_date
where company_id = 123
and date_day = '2023-04-01'

This was very long, and still doesn’t answer your UDF question! But it is a more dbtonic way of thinking about it; hopefully that’s OK and the UDF was a means to the end instead of a mandatory implementation detail for some reason.

lmk what you think!

Topic		Replies	Views
Using dbt to manage user defined functions Show and Tell udf	25	82873	February 9, 2025
Using User defined function from snowflake in dbt model Help	3	6266	June 19, 2023
Using variables inside the dbt_utils.date_spine macro Help dbt-utils , date-and-time , variables	10	11146	January 23, 2025
Problem with UDF macro during 'run dbt' Help	1	1117	May 25, 2023
New Package: dbt-date Show and Tell jinja , date-and-time , packages	2	5332	October 11, 2019

Use a UDF to do a dynamic rollup by date

Related topics