What framework do people use to define KPIs for your company in DBT? I imagine using macros to define all the SQL definitions. Docs to maintain the glossary. I can imagine a world where we have the definitions of metrics and potential dimensions centralized in one place and analysts throughout the company can use them for reporting.
Hey @nehiljain , such a great question! I don’t think there’s one right way to do this, but I’ll give you my two cents.
There’s two main things to consider here:
- If you calculate your metrics in dbt, for example, calculating a
refund_rateper day, you lose the ability to drill into that metric in your BI tool (i.e. understand which products had the highest refund rate for a day). That’s a pretty good reason to say “OK I’m going to just do my metrics in my BI tool, and leave them out of dbt”.
- However, if you don’t have your metrics defined in dbt, that can create two sources of truth! The docs for your dbt project will hold all the column level definitions, but then your BI tool will have the metric definitions. That’s a pretty good reason to then say, “OK, I’m going to put my metrics into dbt instead.”
So: porque no los dos?
Here’s what I recommend:
1. Model your data in dbt to make metric definitions more intuitive
For our example of
refund_rate, here’s some SQL for a pretty intuitive metric definition. This calculation is on top of modeled data, where you’ve already implemented your business definitions of a refund, and excluded any orders that aren’t actually orders:
select order_date_aest, -- i.e. Sydney time sum(is_refunded::integer) as count_refunds, count(*) as count_orders, 1.0 * count_refunds / nullif(count_orders, 0) as refund_rate from analytics.orders group by 1
Compare this to calculating it from the raw data, which often looks something more like this:
select date_trunc('day', convert_timezone('Australia/Sydney', ordered_at)) as order_date_aest, count(case when refunded_at is not null then id end) as count_refunds, count(case when status not in ('pending', 'draft', 'abandoned') then id end) as count_orders, 1.0 * count_refunds / nullif(count_orders, 0) as refund_rate from raw_jaffle_shop.orders group by 1
Trust me, I’ve seen much worse.
By doing the legwork in your dbt model, it’s easier to then communicate the metric definition to others.
2. Document this metric in your BI tool
As mentioned, you generally want to still let your BI tool do this calculation. That way, users can slice and dice by whichever dimensions they choose to, and drill into reported numbers further. Good BI tools let you add descriptions to your measures – since most business users will be interacting with the data through a BI tool, it’s definitely worth jotting down a description there.
3. Add a
metrics model to your dbt project
As a further step, we like to add a
metrics model to the dbt projects we work on. This model is either at the grain of one record per day, with each metric as a column, or one record per metric per day.
If you only have a few metrics to track, I’d recommend having one column per metric – it’s easier to do things like calculate rates when your metrics are in columns.
If you have lots of metrics to calculate, I’d recommend adding the metric to the grain of the model to prevent it from getting too wide. Personally, my brain hurts when I do this, as there are two different data types in the
value column, but it’s the preferred approach of most of the folks I work with:
Metrics models come in really handy because they:
- Allow you to document your metrics alongside the rest of your dbt project
- Provide an absolute “source of truth” for what the company-wide value for the metric actually is (which helps a lot when two people come to the same meeting with different values for the same metrics),
- Are in a structure where you can compare the metric to a particular target – super useful if your org is in the habit of setting OKRs.
- Also give you the option to pre-calculate some fields, for example
month_to_date_valueto track cumulative progress for a target, and
previous_month_valueto compare how you’re trending for this month, versus a year ago. The SQL for this can get pretty gnarly, but it’s cool that it becomes an option!
If you’ve read our guide on how we structure our dbt projects, you’ll notice we make a mention of this. Normally metrics models go in their own subdirectory:
models/metrics. They tend to select from
dim_ models, so sit downstream of them in the DAG.
Hope that helps!
First of all, Thanks a lot for the answer. This makes a lot of sense.
When you model your metrics definitions in dbt, do you also have to take into account all the dimensions that can be used to analyze the metric??? So for example, you want to compare a few metrics by customer_segment. What would be the proposed workflow here?
I’d keep it as simple as possible in your metrics model, with just one record per day. Any ability to slice and dice your data other ways should be done in your BI tool. Otherwise you’ll be adding new ways to slice the data ad infinitum!