Use a UDF to do a dynamic rollup by date

Apologies for reposting, the original contained too much information about our schema.

Here are our two macros, heavily simplified but including the date logic we need. One UDF just overloads the other.

{% macro udf_stocks_and_valuations_as_of_date() %}

CREATE OR REPLACE FUNCTION {{ target.schema }}.udf_stocks_and_valuations_as_of(as_of DATE) RETURNS TABLE (
    company_id BIGINT,
    vested_shares BIGINT,
    total_common_shares BIGINT,
) AS
$$ WITH


stock_certs AS (
    SELECT * FROM {{ ref('stocks') }}
),

cap_tables AS (
    SELECT * FROM {{ ref('cap_tables') }}
),

companies AS (
    SELECT * FROM {{ ref('companies') }}
),

stock_calcs AS (
    SELECT
        *,
        CASE
            WHEN vesting_period = 0
                THEN num_shares
            WHEN
                EXTRACT(YEAR FROM AGE(as_of, issued_on)) * 12 + EXTRACT(MONTH FROM AGE(as_of, issued_on)) < 12
                THEN 0
            ELSE
                (num_shares / (12 * vesting_period)) *
                (EXTRACT(YEAR FROM AGE(as_of, issued_on)) * 12 + EXTRACT(MONTH FROM AGE(as_of, issued_on)))
        END AS vested_shares


    FROM stock_certs
),

final AS (
    SELECT
        companies.company_id,
        SUM(CASE
            WHEN stock_calcs.share_class = 'common'
                THEN stock_calcs.vested_shares
        END) AS vested_shares,
        SUM(CASE
            WHEN stock_calcs.share_class = 'common'
                THEN stock_calcs.num_shares
        END) AS total_common_shares


    FROM stock_calcs
    INNER JOIN cap_tables ON cap_tables.cap_table_id = stock_calcs.cap_table_id
    INNER JOIN companies ON cap_tables.company_id = companies.company_id
    GROUP BY companies.company_id
),

SELECT * FROM final
$$ LANGUAGE SQL;

{% endmacro %}

{% macro udf_stocks_and_valuations_as_of_month() %}

CREATE OR REPLACE FUNCTION {{ target.schema }}.udf_stocks_and_valuations_as_of(yyyy INT, mm INT) RETURNS TABLE (
    company_id BIGINT,
    vested_shares BIGINT,
    total_common_shares BIGINT,
) AS $$
    SELECT *
    FROM {{ target.schema }}.udf_stocks_and_valuations_as_of((make_date(yyyy, mm, 1) + INTERVAL '1 month' - INTERVAL '1 day')::DATE)
$$ LANGUAGE SQL;
{% endmacro %}

I have an additional macro with a conditional based on one of the comments in the original post. I can’t just call the udf creation macros outright, because they they’ll run on a dbt seed, when the referenced tables don’t exist.

-- idempotently create all Postgres UDFs
{% macro create_udfs() %}
  -- Only run this if the outermost DBT command is "run"
  -- (allows us to use on-run-end rather than manual run-operations on container spinup)
  {% if flags.WHICH == "run" %}
    {{ udf_stocks_and_valuations_as_of_date() }}
    {{ udf_stocks_and_valuations_as_of_month() }}
  {% else %}
    select 1 as id
  {% endif %}
{% endmacro %}

Then we have an on-run-end hook in dbt_project.yml.

on-run-end:
  - '{{ create_udfs() }}'

I think this bug can be recreated just by:

  1. Creating a macro that calls {{ref()}}
  2. Calling that macro from another macro within a conditional
  3. Calling the outer macro as on-run-end (or, likely, on-run-start)