Finding active days for a subscription/user/account (date spining)

Every often, someone asks us about finding the active days for a subscription/user/account. We call this pattern “date spining”

For example, this question was asked on dbt Slack last week:

Any idea how to calculate the active products historically (we have products expiring after 1 year from the purchase, they can also be cancelled). It would be good to see how active products change over time.

So let’s assume you can get your data into this shape, in a model called product_subscription_periods:

subscription_id user product start_date end_date
1 emily widgets 2018-01-01 2019-01-01
2 jack gizmos 2018-05-02 2018-06-01

We want to create one record per day that the subscription was active, like this:

subscription_id user product date_day
1 emily widgets 2018-01-01
1 emily widgets 2018-01-02
1 emily widgets 2018-12-31
2 jack gizmos 2018-05-02
2 jack gizmos 2018-05-03
2 jack gizmos 2018-05-31

To do this, you need a list of all dates in your warehouse. We often use the dbt_utils.date_spine macro for this, to materialize a table of dates out of thin air (no matter which data warehouse you use!).

Our util_days model will look something like:

{{ dbt_utils.date_spine(
    datepart="day",
    start_date="'2018-01-01'",
    end_date="dateadd(month, 1, date_trunc('month', getdate()))"
   )
}}

To build our product_subscriptions_by_day model, the SQL looks something like:

with product_subscription_periods as (

    select * from {{ ref('product_subscription_periods') }}

),

util_days as (

    select * from {{ ref('util_days') }}

),

spined as (

    select 
        product_subscription_periods.subscription_id,
        product_subscription_periods.user,
        product_subscription_periods.product,
        util_days.date_day

    from product_subscription_periods

    left join util_days
        -- start date is before the date we want in our table
        on product_subscription_periods.start_date <= util_days.date_day
        -- end date is after the date we want in out model, or there is no end date
        and (product_subscription_periods.end_date > util_days.date_day
               or product_subscription_periods.end_date is null)

)

select * from spined

Now that we have this model, it makes it really easy to calculate metrics in our BI tool, e.g. how many active subscriptions we have on each day:

select
    date_day,
    count(subscription_id) as n_active_subscriptions,
    count(case when product = 'widgets' then subscription_id end) as n_active_widget_subscriptions,
    count(case when product = 'gizmos' then subscription_id end) as n_active_gizmo_subscriptions,
    count(distinct user) as n_distinct_users_with_subscriptions

from analytics.product_subscriptions_by_day

group by 1

We use this pattern to model monthly recurring revenue based on subscriptions, to calculate daily active users, to model inventory levels, and a lot more!

Want to see it in action? We use this pattern in the Monthly Recurring Revenue (MRR) playbook.

For a more in-depth example, the team at GitLab have open sourced their dbt project – check out their date spine model and an amortized subscription model that joins to it. Thanks @tmurphy and @emilie!

15 Likes

This post was a great reference item - thanks @claire! As a breadcrumb for any BigQuery folks coming across this, heads up - there is currently an open PR to correct the date_spine macro. Additionally, BigQuery will not compute a left join with these conditions, though we can use a cross join w/ a corresponding where clause to achieve the same result. As it relates to the PR for date_spine- it largely appears to work (yay!), so I’ll get in that convo and see if we can get it prepped for an upcoming release. :rocket:

Great post!

I believe the end_date condition should allow for NULLs to support records that are the most recent:

    left join util_days
        on product_subscription_periods.start_date <= util_days.date_day
        and (
                product_subscription_periods.end_date > util_days.date_day 
                or product_subscription_periods.end_date is null
        )
2 Likes

Ooh good catch, let me update it in the article :slight_smile:

1 Like

Any reason not to use an inner join here with that join condition if you’re using BigQuery?

I always had trouble getting my inequality conditions consistent (start_date <= date_day and end_date > date_day? The other way around? Both have an equals sign? Neither?)

So I made a macro so I never have to think about it again!

{% macro calendar_fanout(other_model_name, max_day=today_nz(), calendar_model_name = 'calendar', date_column_name='date_day', from='dbt_valid_from', to='dbt_valid_to') -%}
    inner join {{ calendar_model_name }} on {{ calendar_model_name }}.{{ date_column_name }} >= {{ other_model_name }}.{{ from }} 
        and {{ calendar_model_name }}.{{ date_column_name }} < coalesce( {{ other_model_name }}.{{ to }}, {{ var('the_distant_future') }} )
        and {{ calendar_model_name }}.{{ date_column_name }} <= {{ max_day }}
{%- endmacro %}

Almost everything has default values (which assume you’re fanning out a dbt snapshot). In the default case, the only thing that needs configured is other_model_name, the thing that the calendar table will join against.

Usage:

with calendar as (select * from {{ ref('constants__calendar_day') }}),
status_date_ranges as (select * from {{ ref('all_b2c_subscriber_status_date_ranges') }}),

final as (
    select 
        calendar.date_day, 
        status_date_ranges.* 
    from status_date_ranges
    {{ calendar_fanout('status_date_ranges', from='status_valid_from', to='status_valid_to') }}
)

select *
from final
order by date_day

Renders to:

with calendar as (select * from "reporting"."dev_jlabes"."constants__calendar_day"),
status_date_ranges as (select * from __dbt__cte__all_b2c_subscriber_status_date_ranges),

final as (
    select 
        calendar.date_day, 
        status_date_ranges.* 
    from status_date_ranges
    inner join calendar on calendar.date_day >= status_date_ranges.status_valid_from 
        and calendar.date_day < coalesce( status_date_ranges.status_valid_to, '2999-12-31' )
        and calendar.date_day <= convert_timezone('Pacific/Auckland', getdate())::date
)

select *
from final
order by date_day

Bonus: note the sneaky macro-inside-macro to get the current day in NZ, a nice UX sweetener for data consumers who don’t care that technically the data is in UTC time, they just want the freshest stuff.

2 Likes