Hey all - I’m looking for advice about the best way to model something:
I have a users
model and a user_stats
model with facts about those users. One of the facts is total_revenue
representing how much they’ve spent all time and another is upgraded_at
with the date/time they first paid.
To display a list of the most recent customers with their total revenue, I can write a query like this:
select
email,
total_revenue
from analytics.users
join analytics.user_stats on user_stats.user_id = users.id
where total_revenue > 0
order by upgraded_at desc
limit 20
I’m wondering whether there are ways to improve this. A few ideas…
Option 1
I’ll be filtering on total_revenue > 0
often. I could create a new column in user_stats
like is_customer
that way I can just write where is_customer
rather than repeatedly write total_revenue > 0
in analyses:
select
email,
total_revenue
from analytics.users
join analytics.user_stats on user_stats.user_id = users.id
where is_customer
order by upgraded_at desc
limit 20
Option 2
Create a model that just represents customers:
select users.*
from {{ ref('users') }} users
join {{ ref('user_stats') }} on user_stats.user_id = users.id
where total_revenue > 0
Then in the analysis I can write:
select
email,
total_revenue
from analytics.customers customers
join analytics.user_stats on user_stats.user_id = customers.id
order by upgraded_at desc
limit 20
Option 3
Take this a step further and create a recent_customers
model:
select customers.*
from {{ ref('customers') }} customers
join {{ ref('user_stats') }} on user_stats.user_id = customers.id
order by upgraded_at desc
limit 20
Then in the analysis just write:
select
email,
total_revenue
from analytics.recent_customers
join analytics.user_stats on user_stats.user_id = recent_customers.id
Do any of these approaches stand out as much better or worse than the others? How in general do you all think about how much to model vs writing in the actual analysis query?