I’m working on an implementation where we look into using date partitioned tables. So partitioning based on a date column.
Why we are doing it?
These are event tables with a bunch of data coming in on a daily base.
We are already using an incremental materialization which makes dbt run straight forward. But still the increment model consumes a lot GBs for getting the query.
If we would switch to partitioned tables we could save time and GBs in queries.
I looked into your docs describing the approach for ingestion time based partitions: https://docs.getdbt.com/docs/creating-date-partitioned-tables
But I didn’t find a proper way to use that for date partitioned.
Problem is how I can use the [DBT__PARTITION_DATE].
In a date column based partition you simply use a
WHERE DATE(date_column) = “2020-02-02” to limit the query.
I’m trying to facilitate [DBT__PARTITION_DATE] for this. Unfortunately it comes in the wrong format (20200202).
I already tried to use some Jinja magic (datetime parsing and format), but it seems that the [DBT__PARTITION_DATE] replacement happens later and so I run into compile errors.
When I do PARSE_DATE([DBT__PARTITION_DATE],"%Y-%m-%d") I don’t save any GBs when running this query.
Any thoughts what would work in that case?