I am wondering on how to set up incremental materialization of metrics that have different time grains.
Let’s say I have a metric that has day, month and year grains. Every day I want to materialize its values into two different tables: one with daily grain, one with monthly grain.
Daily table materialization can probably be set up in a way that I always calculate the data for the previous full day and I never really have to merge anything, just insert new rows one day at a time.
But what about the monthly table materialization? I expect the table to return the “last known value” for the current month (e.g. on 16th of March 2023 I want to show the value calculated based on 2023 March 1-15).
I have variables “data_interval_start” and “data_interval_end” that I get from Airflow, they correspond to the run start timestamp and run start timestamp + interval respectively. I guess I can work with these to recalculate a suitable amount of data every time, but I’m wondering what is the best approach to do it.
Detailed example to work with
Metric definition:
- name: orders_finished
label: Orders Finished
description: "Number of orders finished"
model: ref('schema_name')
calculation_method: count_distinct
expression: order_id
timestamp: created
time_grains: [day, month, year]
filters:
- field: state
operator: '='
value: "'finished'"
dimensions:
- city_id
Model that materializes the metric:
{{
config(
materialized ="incremental",
incremental_strategy ="merge",
unique_key =["city_id", "date_day"],
partition_by ="date_day",
schema = "schema_name"
)
}}
select *
from {{ metrics.calculate(
[
metric('orders_finished')
],
grain='day',
dimensions=['city_id']
) }}