Use a UDF to do a dynamic rollup by date

Thanks for sharing that extra context! Super helpful.

From looking at your query, I would consider skipping the UDF altogether and instead use a date spine pattern.

This would look something like this:

--models/date_spine.sql

/*
Use whatever range makes sense for your business.
I did 100 years when I set this up at my last job and 
filtered appropriately at query time
*/

{{ dbt_utils.date_spine(
    datepart="day",
    start_date="cast('2000-01-01' as date)",
    end_date="cast('2099-12-31' as date)"
   )
}}

Then in your model, you fan this out day by day with an inequality join. Note that this also means you can use dbt’s datediff abstraction instead of having to do the Postgres month math yourself (I’ve left the original code commented out):

--models/stocks_and_valuations_as_of_date.sql

WITH
date_spine as (
    SELECT * FROM {{ ref('date_spine') }}
),

stock_certs AS (
    SELECT * FROM {{ ref('stocks') }}
),

cap_tables AS (
    SELECT * FROM {{ ref('cap_tables') }}
),

companies AS (
    SELECT * FROM {{ ref('companies') }}
),

stock_calcs AS (
    SELECT
        date_spine.date_day, 
        stocks.*,
        CASE
            WHEN vesting_period = 0
                THEN num_shares
            WHEN
                {{ dbt.datediff('date_spine.date_day', 'stocks.issued_on', 'month') }} < 12
                --EXTRACT(YEAR FROM AGE(date_spine.date_day, issued_on)) * 12 + EXTRACT(MONTH FROM AGE(date_spine.date_day, issued_on)) < 12
                THEN 0
            ELSE
                (num_shares / (12 * vesting_period)) *
                ( {{ dbt.datediff('date_spine.date_day', 'stocks.issued_on', 'month') }} < 12 )
                --(EXTRACT(YEAR FROM AGE(date_spine.date_day, issued_on)) * 12 + EXTRACT(MONTH FROM AGE(date_spine.date_day, issued_on)))
        END AS vested_shares

    FROM date_spine
    LEFT OUTER JOIN stocks ON date_spine.date_day >= stocks.issued_on
    WHERE date_spine.date_day >= '2021-12-31'
      AND date_spine.date_day < '2030-01-01' -- instead of hardcoding it, it'd be nice to get the furthermost date where vesting actually completes
),

final AS (
    SELECT
        stock_calcs.date_day, 
        companies.company_id,
        SUM(CASE
            WHEN stock_calcs.share_class = 'common'
                THEN stock_calcs.vested_shares
        END) AS vested_shares,
        SUM(CASE
            WHEN stock_calcs.share_class = 'common'
                THEN stock_calcs.num_shares
        END) AS total_common_shares


    FROM stock_calcs
    INNER JOIN cap_tables ON cap_tables.cap_table_id = stock_calcs.cap_table_id
    INNER JOIN companies ON cap_tables.company_id = companies.company_id
    GROUP BY stock_calcs.date_day, companies.company_id
)

SELECT * FROM final

Assuming your input table looks like this:

company_id employee_id issued_on vesting_period num_shares
123 abc 2022-01-01 4 480
123 def 2022-01-22 4 960
123 ghi 2023-04-12 0 480
456 jkl 2023-02-01 4 480

Then the results of the intermediate stock_calcs CTE would look like:

date_day company_id employee_id issued_on vesting_period num_shares vested_shares
2021-12-31 null null null null null null
2022-01-01 123 abc 2022-01-01 4 480 0
2022-01-22 123 abc 2022-01-01 4 480 0
2022-01-22 123 def 2022-01-22 4 960 0
2023-01-01 123 abc 2022-01-01 4 480 120
2023-01-01 123 def 2022-01-22 4 960 0
2023-02-01 123 abc 2022-01-01 4 480 130
2023-02-01 123 def 2022-01-22 4 960 120
2023-02-01 456 jkl 2023-02-01 4 480 0
2023-04-12 123 abc 2022-01-01 4 480 150
2023-04-12 123 def 2022-01-22 4 960 140
2023-04-12 123 ghi 2023-04-12 0 480 480
2023-04-12 456 jkl 2023-02-01 4 480 0

Which is pretty overwhelming and gross! But after aggregration in the final CTE, you wind up with a nice wee table with a row per company and day:

date_day company_id vested_shares total_common_shares
2021-12-31 null null null
2022-01-01 123 0 480
2022-01-22 123 0 1440
2023-01-01 123 120 1440
2023-02-01 123 250 1440
2023-02-01 456 0 480
2023-04-12 123 770 1920
2023-04-12 456 0 480

From there, you can also make your second model which is just the state of play on the first of the month. Something like:

--models/stocks_and_valuations_as_of_month
select *
from {{ ref('stocks_and_valuations_as_of_date') }}
where extract(day from date_day) = 1 --I hope this is right! However you get the first of the month in Postgres

And then your production code will just query the resulting table:

select * 
from analytics.stocks_and_valuations_as_of_date
where company_id = 123
and date_day = '2023-04-01'

This was very long, and still doesn’t answer your UDF question! But it is a more dbtonic way of thinking about it; hopefully that’s OK and the UDF was a means to the end instead of a mandatory implementation detail for some reason.

lmk what you think!

1 Like