I recently asked a question in Slack about best practices around modeling and would like to continue that discussion here because there are still some things I’m still unclear about.
To take a real-world example, I’m using Mode as a BI tool and I want to create a stacked bar chart showing how many recent upgrades there were by day for a certain segment of the users broken down by plan.
Here’s what the query looks like if I write it directly in Mode:
select
upgrade_date,
plans.name as plan_name,
count(*) as upgrades
from analytics.users
join analytics.plans using (plan_id)
where
upgrade_date >= current_date - interval '30 days' and
vertical = 'work'
group by 1, 2
Alternatively, I can model this using dbt, something like this:
with recent_work_users as (
select *
from {{ ref('users') }}
where
vertical = 'work' and
upgrade_date >= current_date - interval '30 days'
),
final as (
select
upgrade_date,
plans.name as plan_name,
count(*) as upgrades
from recent_work_users
join analytics.plans using (plan_id)
group by 1, 2
)
select * from final
And then in Mode simply run:
select * from analytics.recent_work_upgrades_by_plan
Which approach makes more sense - writing the query directly in Mode or modeling it in dbt? I could be wrong, but I feel like this isn’t what I should be modeling in dbt, but writing the full query in Mode doesn’t seem ideal either because it won’t be version controlled, etc.
It may be that analyses are the way to go, but I’m unclear about how to fit them into my workflow. For example, I typically will draft a query, run it, then repeat a few times until things are just right. Would I do that in Mode, then paste the final query into a new analysis file, replacing the view names with dbt references & converting it to use CTEs, compile it, then paste the compiled query back into Mode? I’m guessing there’s a better way :).
It seems like there are a lot of ways this can be done - which approach would you all recommend?