Finding active days for a subscription/user/account (date spining)

Every often, someone asks us about finding the active days for a subscription/user/account. We call this pattern “date spining”

For example, this question was asked on dbt Slack last week:

Any idea how to calculate the active products historically (we have products expiring after 1 year from the purchase, they can also be cancelled). It would be good to see how active products change over time.

So let’s assume you can get your data into this shape, in a model called product_subscription_periods:

subscription_id user product start_date end_date
1 emily widgets 2018-01-01 2019-01-01
2 jack gizmos 2018-05-02 2018-06-01

We want to create one record per day that the subscription was active, like this:

subscription_id user product date_day
1 emily widgets 2018-01-01
1 emily widgets 2018-01-02
1 emily widgets 2018-12-31
2 jack gizmos 2018-05-02
2 jack gizmos 2018-05-03
2 jack gizmos 2018-05-31

To do this, you need a list of all dates in your warehouse. We often use the dbt_utils.date_spine macro for this, to materialize a table of dates out of thin air (no matter which data warehouse you use!).

Our util_days model will look something like:

{{ dbt_utils.date_spine(
    datepart="day",
    start_date="'2018-01-01'",
    end_date="dateadd(month, 1, date_trunc('month', getdate()))"
   )
}}

To build our product_subscriptions_by_day model, the SQL looks something like:

with product_subscription_periods as (

    select * from {{ ref('product_subscription_periods') }}

),

util_days as (

    select * from {{ ref('util_days') }}

),

spined as (

    select 
        product_subscription_periods.subscription_id,
        product_subscription_periods.user,
        product_subscription_periods.product,
        util_days.date_day

    from product_subscription_periods

    left join util_days
        -- start date is before the date we want in our table
        on product_subscription_periods.start_date <= util_days.date_day
        -- end date is after the date we want in out model
        and product_subscription_periods.end_date > util_days.date_day

)

select * from spined

Now that we have this model, it makes it really easy to calculate metrics in our BI tool, e.g. how many active subscriptions we have on each day:

select
    date_day,
    count(subscription_id) as n_active_subscriptions,
    count(case when product = 'widgets' then subscription_id end) as n_active_widget_subscriptions,
    count(case when product = 'gizmos' then subscription_id end) as n_active_gizmo_subscriptions,
    count(distinct user) as n_distinct_users_with_subscriptions

from analytics.product_subscriptions_by_day

group by 1

We use this pattern to model monthly recurring revenue based on subscriptions, to calculate daily active users, to model inventory levels, and a lot more!

Want to see it in action? The team at GitLab have open sourced their dbt project – check out their date spine model and an amortized subscription model that joins to it. Thanks @tmurphy and @emilie!

We often push this pattern further, to account for things like:

  • Finding the range of all days between the first time a user had a subscription, and the last time they had a subscription (useful for calculating churn etc.)
  • Building this model incrementally

^ I’ll leave those things for a more in-depth post though :wink:

5 Likes