Thanks for sharing that extra context! Super helpful.
From looking at your query, I would consider skipping the UDF altogether and instead use a date spine pattern.
This would look something like this:
--models/date_spine.sql
/*
Use whatever range makes sense for your business.
I did 100 years when I set this up at my last job and
filtered appropriately at query time
*/
{{ dbt_utils.date_spine(
datepart="day",
start_date="cast('2000-01-01' as date)",
end_date="cast('2099-12-31' as date)"
)
}}
Then in your model, you fan this out day by day with an inequality join. Note that this also means you can use dbt’s datediff abstraction instead of having to do the Postgres month math yourself (I’ve left the original code commented out):
--models/stocks_and_valuations_as_of_date.sql
WITH
date_spine as (
SELECT * FROM {{ ref('date_spine') }}
),
stock_certs AS (
SELECT * FROM {{ ref('stocks') }}
),
cap_tables AS (
SELECT * FROM {{ ref('cap_tables') }}
),
companies AS (
SELECT * FROM {{ ref('companies') }}
),
stock_calcs AS (
SELECT
date_spine.date_day,
stocks.*,
CASE
WHEN vesting_period = 0
THEN num_shares
WHEN
{{ dbt.datediff('date_spine.date_day', 'stocks.issued_on', 'month') }} < 12
--EXTRACT(YEAR FROM AGE(date_spine.date_day, issued_on)) * 12 + EXTRACT(MONTH FROM AGE(date_spine.date_day, issued_on)) < 12
THEN 0
ELSE
(num_shares / (12 * vesting_period)) *
( {{ dbt.datediff('date_spine.date_day', 'stocks.issued_on', 'month') }} < 12 )
--(EXTRACT(YEAR FROM AGE(date_spine.date_day, issued_on)) * 12 + EXTRACT(MONTH FROM AGE(date_spine.date_day, issued_on)))
END AS vested_shares
FROM date_spine
LEFT OUTER JOIN stocks ON date_spine.date_day >= stocks.issued_on
WHERE date_spine.date_day >= '2021-12-31'
AND date_spine.date_day < '2030-01-01' -- instead of hardcoding it, it'd be nice to get the furthermost date where vesting actually completes
),
final AS (
SELECT
stock_calcs.date_day,
companies.company_id,
SUM(CASE
WHEN stock_calcs.share_class = 'common'
THEN stock_calcs.vested_shares
END) AS vested_shares,
SUM(CASE
WHEN stock_calcs.share_class = 'common'
THEN stock_calcs.num_shares
END) AS total_common_shares
FROM stock_calcs
INNER JOIN cap_tables ON cap_tables.cap_table_id = stock_calcs.cap_table_id
INNER JOIN companies ON cap_tables.company_id = companies.company_id
GROUP BY stock_calcs.date_day, companies.company_id
)
SELECT * FROM final
Assuming your input table looks like this:
company_id |
employee_id |
issued_on |
vesting_period |
num_shares |
123 |
abc |
2022-01-01 |
4 |
480 |
123 |
def |
2022-01-22 |
4 |
960 |
123 |
ghi |
2023-04-12 |
0 |
480 |
456 |
jkl |
2023-02-01 |
4 |
480 |
Then the results of the intermediate stock_calcs
CTE would look like:
date_day |
company_id |
employee_id |
issued_on |
vesting_period |
num_shares |
vested_shares |
2021-12-31 |
null |
null |
null |
null |
null |
null |
2022-01-01 |
123 |
abc |
2022-01-01 |
4 |
480 |
0 |
… |
|
|
|
|
|
|
2022-01-22 |
123 |
abc |
2022-01-01 |
4 |
480 |
0 |
2022-01-22 |
123 |
def |
2022-01-22 |
4 |
960 |
0 |
… |
|
|
|
|
|
|
2023-01-01 |
123 |
abc |
2022-01-01 |
4 |
480 |
120 |
2023-01-01 |
123 |
def |
2022-01-22 |
4 |
960 |
0 |
… |
|
|
|
|
|
|
2023-02-01 |
123 |
abc |
2022-01-01 |
4 |
480 |
130 |
2023-02-01 |
123 |
def |
2022-01-22 |
4 |
960 |
120 |
2023-02-01 |
456 |
jkl |
2023-02-01 |
4 |
480 |
0 |
… |
|
|
|
|
|
|
2023-04-12 |
123 |
abc |
2022-01-01 |
4 |
480 |
150 |
2023-04-12 |
123 |
def |
2022-01-22 |
4 |
960 |
140 |
2023-04-12 |
123 |
ghi |
2023-04-12 |
0 |
480 |
480 |
2023-04-12 |
456 |
jkl |
2023-02-01 |
4 |
480 |
0 |
Which is pretty overwhelming and gross! But after aggregration in the final
CTE, you wind up with a nice wee table with a row per company and day:
date_day |
company_id |
vested_shares |
total_common_shares |
2021-12-31 |
null |
null |
null |
2022-01-01 |
123 |
0 |
480 |
… |
|
|
|
2022-01-22 |
123 |
0 |
1440 |
… |
|
|
|
2023-01-01 |
123 |
120 |
1440 |
… |
|
|
|
2023-02-01 |
123 |
250 |
1440 |
2023-02-01 |
456 |
0 |
480 |
… |
|
|
|
2023-04-12 |
123 |
770 |
1920 |
2023-04-12 |
456 |
0 |
480 |
From there, you can also make your second model which is just the state of play on the first of the month. Something like:
--models/stocks_and_valuations_as_of_month
select *
from {{ ref('stocks_and_valuations_as_of_date') }}
where extract(day from date_day) = 1 --I hope this is right! However you get the first of the month in Postgres
And then your production code will just query the resulting table:
select *
from analytics.stocks_and_valuations_as_of_date
where company_id = 123
and date_day = '2023-04-01'
This was very long, and still doesn’t answer your UDF question! But it is a more dbtonic way of thinking about it; hopefully that’s OK and the UDF was a means to the end instead of a mandatory implementation detail for some reason.
lmk what you think!