I have a table, let’s call it “customers_monthly” that shows the amount of customers per month.
With this table and a “pricing_table” I calculate the billing revenue of a company in a table called “billing_revenue_table”. Simply price * amount of customers for a particular month.
The billing revenue is not reversible. Once created it will stay forever like this. If for May the billing revenue was 5k$, it will always stay like that. Independently if the data is correct or not.
The problem is: the amount of customers could change even for previous months. For diverse reasons: numbers were incorrect, accounting team found out other data sources etc.
My solution was: use a snapshot of the first execution time of the “billing_revenue_table” table. And then query filtering by the oldest insert_date.
But I was thinking if there are smarter solutions out there.
Any ideas ?
Thank in advice!
A snapshot is a pretty good approach if you want to keep a static set of facts regardless of changes after the fact.
You could also create an incremental model which only processes data for the current month:
{{ config(
materialized = 'incremental',
unique_key = ['date_month', 'customer_id'],
full_refresh = false
) }}
with customers_monthly as (select * from {{ ref('customers_monthly') }} ),
pricing_table as (select * from {{ ref('pricing_table') }},
final as (
select
date_month,
customer_id,
-- your transformations here
from customers_monthly
inner join pricing_table on ...
{% if is_incremental() %}
where date_month = date_trunc(month, getdate())
{% endif %}
)
select * from final
By setting full refresh to false, your model will never be rebuilt. Like a snapshot, this data won’t be able to be regenerated from first principles if the table got dropped, so you’ll want to take precautions to protect it.
You could also check out this video from Coalesce 2020 - it’s not how we manage our metrics anymore, but it’s got a lot of overlap with your use case: Ask the experts: Version controlling your metrics
1 Like