Hey @nehiljain , such a great question! I don’t think there’s one right way to do this, but I’ll give you my two cents.
There’s two main things to consider here:
- If you calculate your metrics in dbt, for example, calculating a
refund_rate
per day, you lose the ability to drill into that metric in your BI tool (i.e. understand which products had the highest refund rate for a day). That’s a pretty good reason to say “OK I’m going to just do my metrics in my BI tool, and leave them out of dbt”. - However, if you don’t have your metrics defined in dbt, that can create two sources of truth! The docs for your dbt project will hold all the column level definitions, but then your BI tool will have the metric definitions. That’s a pretty good reason to then say, “OK, I’m going to put my metrics into dbt instead.”
So: porque no los dos?
Here’s what I recommend:
1. Model your data in dbt to make metric definitions more intuitive
For our example of refund_rate
, here’s some SQL for a pretty intuitive metric definition. This calculation is on top of modeled data, where you’ve already implemented your business definitions of a refund, and excluded any orders that aren’t actually orders:
select
order_date_aest, -- i.e. Sydney time
sum(is_refunded::integer) as count_refunds,
count(*) as count_orders,
1.0 * count_refunds / nullif(count_orders, 0) as refund_rate
from analytics.orders
group by 1
Compare this to calculating it from the raw data, which often looks something more like this:
select
date_trunc('day', convert_timezone('Australia/Sydney', ordered_at)) as order_date_aest,
count(case when refunded_at is not null then id end) as count_refunds,
count(case when status not in ('pending', 'draft', 'abandoned') then id end) as count_orders,
1.0 * count_refunds / nullif(count_orders, 0) as refund_rate
from raw_jaffle_shop.orders
group by 1
Trust me, I’ve seen much worse.
By doing the legwork in your dbt model, it’s easier to then communicate the metric definition to others.
2. Document this metric in your BI tool
As mentioned, you generally want to still let your BI tool do this calculation. That way, users can slice and dice by whichever dimensions they choose to, and drill into reported numbers further. Good BI tools let you add descriptions to your measures – since most business users will be interacting with the data through a BI tool, it’s definitely worth jotting down a description there.
3. Add a metrics
model to your dbt project
As a further step, we like to add a metrics
model to the dbt projects we work on. This model is either at the grain of one record per day, with each metric as a column, or one record per metric per day.
If you only have a few metrics to track, I’d recommend having one column per metric – it’s easier to do things like calculate rates when your metrics are in columns.
order_date | count_orders | count_refunds | refund_rate |
---|---|---|---|
2019-01-01 | 100 | 7 | 7.00% |
2019-01-02 | 200 | 7 | 3.50% |
If you have lots of metrics to calculate, I’d recommend adding the metric to the grain of the model to prevent it from getting too wide. Personally, my brain hurts when I do this, as there are two different data types in the value
column, but it’s the preferred approach of most of the folks I work with:
order_date | metric | type | value |
---|---|---|---|
2019-01-01 | count_orders | count | 100 |
2019-01-01 | count_refunds | count | 7 |
2019-01-01 | refund_rate | rate | 0.07 |
2019-01-02 | count_orders | count | 200 |
2019-01-02 | count_refunds | count | 7 |
2019-01-02 | refund_rate | rate | 0.07 |
Metrics models come in really handy because they:
- Allow you to document your metrics alongside the rest of your dbt project
- Provide an absolute “source of truth” for what the company-wide value for the metric actually is (which helps a lot when two people come to the same meeting with different values for the same metrics),
- Are in a structure where you can compare the metric to a particular target – super useful if your org is in the habit of setting OKRs.
- Also give you the option to pre-calculate some fields, for example
month_to_date_value
to track cumulative progress for a target, andprevious_month_value
to compare how you’re trending for this month, versus a year ago. The SQL for this can get pretty gnarly, but it’s cool that it becomes an option!
If you’ve read our guide on how we structure our dbt projects, you’ll notice we make a mention of this. Normally metrics models go in their own subdirectory: models/metrics
. They tend to select from fct_
and dim_
models, so sit downstream of them in the DAG.
Hope that helps!