Advice needed around BigQuery partitioned tables


I’m attempting to create a partitioned table within BigQuery. My model has the below parameters set:

    materialized = "table",
    partition_by = {
      "field": "partition_date",
      "data_type": "date",
      "granularity": "day"
    require_partition_filter = true

‘partition_date’ is a date field in the format %Y-%m-%d. See below query excerpt (min_start_date is a string in yyyyMMdd format E.G 20230724):

CAST(CONCAT(SUBSTR(min_start_date,0,4),'-',SUBSTR(min_start_date,5,2),'-',SUBSTR(min_start_date,7,2)) AS DATE) AS partition_date

When I run the model containing a different partition_date, the existing partition is removed. What I would like is for there to be a second partition with the new date, E.G:

Partition_1 = 2023_07_24
Partition_2 = 2023_07_31

I understand there are incremental_models, however each new partition will contain a new set of data, so I thought a normal table materialization would be better.

Any help is greatly appreciated.

The substr code is unnecessary - there’s a built-in BigQuery function parse_date that does what you want like parse_date("%Y%m%d", "20230115")

When you set the model to use materialized = "table", this will use a CREATE OR REPLACE TABLE <name> AS ... statement. The instruction materialized = "table" is synonymous with “completely destroy and recreate my table on each dbt command”. If that isn’t the behaviour you want, you should use a different materialization, probably “incremental” with the “insert_overwrite” strategy. You can refer to the docs for that|here and|here.

Note: @Mike Stanley originally posted this reply in Slack. It might not have transferred perfectly.

Thank you, I will explore the incremental model with the insert overwrite strategy. I also appreciate the note on date formatting.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.