Date Dimensions

Hello All –

We have built a date dimension. Just wanted to share our code and results here and see if others have built their date dimensions differently, anything that you would add/remove, etc.?

{{
config(
pre_hook=“ALTER SESSION SET WEEK_OF_YEAR_POLICY=1, WEEK_START=7”
)
}}

WITH DATE_SPINE AS
(
{{ dbt_utils.date_spine(
datepart=“day”,
start_date=“to_date(‘01/01/2018’, ‘mm/dd/yyyy’)”,
end_date=“TO_DATE(CONCAT(YEAR(CURRENT_DATE) + 6,‘-12-31’))”
)
}}
)
SELECT
CAST(DATE_DAY AS DATE) AS DATE_KEY
,CAST(DAYOFYEAR(DATE_KEY) AS INT) AS DAY_OF_YEAR
,CAST(DECODE (RIGHT(EXTRACT(DAYOFMONTH FROM DATE_KEY), 1),
1 , ‘st’,
2 , ‘nd’,
3 , ‘rd’,
‘th’ – Default option, all other days ending aside from 1st, 2nd, and 3rd end in ‘th’
) AS VARCHAR(2)) AS DAY_SUFFIX
,CAST(YEAR(DATE_KEY) || RIGHT(‘0’ || WEEK(DATE_KEY), 2) AS INT) AS WEEK_KEY
,CAST(WEEKOFYEAR(DATE_KEY) AS INT) AS WEEK_OF_YEAR
,CAST(DAYOFWEEK(DATE_KEY) AS INT) AS DAY_OF_WEEK
,CAST(DAYNAME(DATE_KEY) AS VARCHAR(5)) AS WEEK_DAY_SHORT_DESC
,CAST(DECODE (EXTRACT(DAYOFWEEK FROM DATE_KEY),
1 , ‘Sunday’,
2 , ‘Monday’,
3 , ‘Tuesday’,
4 , ‘Wednesday’,
5 , ‘Thursday’,
6 , ‘Friday’,
7 , ‘Saturday’
) AS VARCHAR(9)) AS WEEK_DAY_DESC
,CAST(TRUNC(DATE_KEY, ‘Week’) AS DATE) AS FIRST_DAY_OF_WEEK
,CAST(LAST_DAY(DATE_KEY, ‘Week’) AS DATE) AS LAST_DAY_OF_WEEK
,CAST(YEAR(DATE_KEY) || RIGHT(‘0’ || MONTH(DATE_KEY), 2) AS INT) AS MONTH_KEY
,CAST(MONTH(DATE_KEY) AS INT) AS MONTH_OF_YEAR
,CAST(DAYOFMONTH(DATE_KEY) AS INT) AS DAY_OF_MONTH
,CAST(MONTHNAME(DATE_KEY) AS VARCHAR(5)) AS MONTH_SHORT_DESC
,CAST(TO_CHAR(DATE_KEY, ‘MMMM’) AS VARCHAR(50)) AS MONTH_DESC
,CAST(TRUNC(DATE_KEY, ‘Month’) AS DATE) AS FIRST_DAY_OF_MONTH
,CAST(LAST_DAY(DATE_KEY, ‘Month’) AS DATE) AS LAST_DAY_OF_MONTH
,CAST(YEAR(DATE_KEY) || QUARTER(DATE_KEY) AS INT) AS QUARTER_KEY
,CAST(QUARTER(DATE_KEY) AS INT) AS QUARTER_OF_YEAR
,CAST(DATE_KEY - TRUNC(DATE_KEY, ‘Quarter’) + 1 AS INT) AS DAY_OF_QUARTER
,CAST(‘Q’ || QUARTER_OF_YEAR AS VARCHAR(5)) AS QUARTER_SHORT_DESC
,CAST('Quarter ’ || QUARTER_OF_YEAR AS VARCHAR(50)) AS QUARTER_DESC
,CAST(TRUNC(DATE_KEY, ‘Quarter’) AS DATE) AS FIRST_DAY_OF_QUARTER
,CAST(LAST_DAY(DATE_KEY, ‘Quarter’) AS DATE) AS LAST_DAY_OF_QUARTER
,CAST(YEAR(DATE_KEY) AS INT) AS YEAR_KEY
,CAST(TRUNC(DATE_KEY, ‘Year’) AS DATE) AS FIRST_DAY_OF_YEAR
,CAST(LAST_DAY(DATE_KEY, ‘Year’) AS DATE) AS LAST_DAY_OF_YEAR
,CAST(ROW_NUMBER() OVER (PARTITION BY YEAR(DATE_KEY), MONTH(DATE_KEY), DAYOFWEEK(DATE_KEY) ORDER BY DATE_KEY) AS INT) AS ORDINAL_WEEKDAY_OF_MONTH
,CAST(CASE
WHEN MONTH(DATE_KEY) = 1 AND DAYOFMONTH(DATE_KEY) = 1
THEN ‘New Year’‘s Day’
WHEN MONTH(DATE_KEY) = 1 AND DAYOFMONTH(DATE_KEY) = 20 AND ((YEAR(DATE_KEY) - 1) % 4) = 0
THEN ‘Inauguration Day’
WHEN MONTH(DATE_KEY) = 1 AND DAYOFWEEK(DATE_KEY) = 2 AND ORDINAL_WEEKDAY_OF_MONTH = 3
THEN ‘Martin Luther King Jr Day’
WHEN MONTH(DATE_KEY) = 2 AND DAYOFMONTH(DATE_KEY) = 14
THEN ‘Valentine’‘s Day’
WHEN MONTH(DATE_KEY) = 2 AND DAYOFWEEK(DATE_KEY) = 2 AND ORDINAL_WEEKDAY_OF_MONTH = 3
THEN ‘President’‘s Day’
WHEN MONTH(DATE_KEY) = 3 AND DAYOFMONTH(DATE_KEY) = 17
THEN ‘Saint Patrick’‘s Day’
WHEN MONTH(DATE_KEY) = 5 AND DAYOFWEEK(DATE_KEY) = 1 AND ORDINAL_WEEKDAY_OF_MONTH = 2
THEN ‘Mother’‘s Day’
WHEN MONTH(DATE_KEY) = 5 AND DAYOFWEEK(DATE_KEY) = 2
AND LAST_VALUE(DAYOFMONTH(DATE_KEY)) OVER (PARTITION BY MONTH_KEY
ORDER BY TO_DATE(DATE_KEY)) - 7 <= DAYOFMONTH(DATE_KEY)
THEN ‘Memorial Day’
WHEN MONTH(DATE_KEY) = 6 AND DAYOFWEEK(DATE_KEY) = 1 AND ORDINAL_WEEKDAY_OF_MONTH = 3
THEN ‘Father’‘s Day’
WHEN MONTH(DATE_KEY) = 7 AND DAYOFMONTH(DATE_KEY) = 4
THEN ‘Independence Day’
WHEN MONTH(DATE_KEY) = 9 AND DAYOFWEEK(DATE_KEY) = 2 AND ORDINAL_WEEKDAY_OF_MONTH = 1
THEN ‘Labor Day’
WHEN MONTH(DATE_KEY) = 10 AND DAYOFWEEK(DATE_KEY) = 2 AND ORDINAL_WEEKDAY_OF_MONTH = 2
THEN ‘Columbus Day’
WHEN MONTH(DATE_KEY) = 10 AND DAYOFWEEK(DATE_KEY) = 31
THEN ‘Halloween’
WHEN MONTH(DATE_KEY) = 11 AND DAYOFWEEK(DATE_KEY) = 5 AND ORDINAL_WEEKDAY_OF_MONTH = 4
THEN ‘Thanksgiving Day’
WHEN MONTH(DATE_KEY) = 12 AND DAYOFMONTH(DATE_KEY) = 25
THEN ‘Christmas Day’
WHEN MONTH(DATE_KEY) = 12 AND DAYOFMONTH(DATE_KEY) = 26
THEN ‘Boxing Day’
ELSE NULL
END AS VARCHAR(50)) AS HOLIDAY_DESC
,CAST(CASE WHEN HOLIDAY_DESC IS NULL THEN 0 ELSE 1 END AS BOOLEAN) AS IS_HOLIDAY
FROM DATE_SPINE

{{
config(
post_hook=‘{{ add_not_null_constraints([“DATE_KEY”
,“DAY_OF_YEAR”
,“DAY_SUFFIX”
,“WEEK_KEY”
,“WEEK_OF_YEAR”
,“DAY_OF_WEEK”
,“WEEK_DAY_SHORT_DESC”
,“WEEK_DAY_DESC”
,“FIRST_DAY_OF_WEEK”
,“LAST_DAY_OF_WEEK”
,“MONTH_KEY”
,“MONTH_OF_YEAR”
,“DAY_OF_MONTH”
,“MONTH_SHORT_DESC”
,“MONTH_DESC”
,“FIRST_DAY_OF_MONTH”
,“LAST_DAY_OF_MONTH”
,“QUARTER_KEY”
,“QUARTER_OF_YEAR”
,“DAY_OF_QUARTER”
,“QUARTER_SHORT_DESC”
,“QUARTER_DESC”
,“FIRST_DAY_OF_QUARTER”
,“LAST_DAY_OF_QUARTER”
,“YEAR_KEY”
,“FIRST_DAY_OF_YEAR”
,“LAST_DAY_OF_YEAR”
,“IS_HOLIDAY”]) }}’
)
}}

2 Likes

Hey @josh!

I wrote about the approach we took at GitLab around this a bit ago. Take a look and let me know what you think: Building a Calendar Table using dbt

Thanks @emilie I took a look at the date dimension code that you referenced. Looks like we have many similar columns. You also have some fiscal year columns that we don’t. On the other hand, we have some additional columns like DAY_SUFFIX (e.g. “1st”, “2nd”), WEEK_DAY_DESC (not just “Wed” but “Wednesday”, "Thursday), some keys like MONTH_KEY, and, most importantly, a holiday indicator and actual names for specific holidays.

Also I think some of the calculations in your date dimension could be simplified as we have done, e.g. instead of:

FIRST_VALUE(date_day) OVER (PARTITION BY year_actual, month_actual ORDER BY date_day) AS first_day_of_month,

you could do:

TRUNC(date_day, 'Month')

for this:

decode(extract('month',date_day),

           1, 'January',

           2, 'February',

           3, 'March',

           4, 'April',

           5, 'May',

           6, 'June',

           7, 'July',

           8, 'August',

           9, 'September',

           10, 'October',

           11, 'November',

           12, 'December') AS month_name,

you could do:

TO_CHAR(date_day, 'MMMM')

and for this:

(year_actual || '-' || DECODE(extract(quarter,date_day),

           1, 'Q1',

           2, 'Q2',

           3, 'Q3',

           4, 'Q4')) AS quarter_name,

you could do:

'Q' || EXTRACT(QUARTER FROM date_day)

Just simplifying and making the code cleaner. You could also simplify many of the other window functions in the same way using the TRUNC() and LAST_DAY() functions.

Do you really not have a need for holiday calculations? We often need to do workday calculations or similar things and need to know the holidays for that.

2 Likes

Very cool! Just wanted to mention this macro I added to the dbt-date package a few weeks ago that does some of that as well.

1 Like

@josh Thanks for the feedback. I created https://gitlab.com/gitlab-data/analytics/merge_requests/2121 to make those improvements and add some global holidays. We haven’t had a need for them yet, but that’s no reason not to add it!

2 Likes

Hi there, how do you guys do with the function concat and year, mine is not working.

I noticed the USA-s holiday are not working.

You’ve got an off-by-one weekday mapping in the holiday logic.

What’s going wrong (in short)

  • You use DAYOFWEEK(date_key) (Snowflake) for holiday rules, but you treat it like ISO numbering.

  • In Snowflake, DAYOFWEEK returns 0=Sun, 1=Mon, 2=Tue, …, 6=Sat.
    Your CASE assumes 1=Sun, 2=Mon, so Monday holidays shift to Tuesday and Sunday holidays shift to Monday.

  • There’s also a small bug where Halloween checks dayofweek(date_key) = 31 instead of dayofmonth(date_key) = 31.

    Untitled

Minimal, safe fix

  1. Use ISO weekday everywhere (for both comparisons and ordinal grouping), i.e., EXTRACT('dayofweek_iso', date_key) where 1=Mon … 7=Sun.

  2. Fix the Halloween predicate.

Here’s a targeted patch you can apply to your model:

-- 1) Make ordinal weekday-of-month use ISO weekday to match the predicates
CAST(
  ROW_NUMBER() OVER (
    PARTITION BY YEAR(date_key), MONTH(date_key), EXTRACT('dayofweek_iso', date_key)
    ORDER BY date_key
  ) AS INT
) AS ordinal_weekday_of_month,

-- 2) Replace every `dayofweek(date_key)` comparison in the holiday CASE with ISO equivalents:
--   Monday  -> EXTRACT('dayofweek_iso', date_key) = 1
--   Thursday-> EXTRACT('dayofweek_iso', date_key) = 4
--   Sunday  -> EXTRACT('dayofweek_iso', date_key) = 7

CASE
  WHEN MONTH(date_key) = 1 AND DAYOFMONTH(date_key) = 1 THEN 'New Year''s Day'
  WHEN MONTH(date_key) = 1 AND DAYOFMONTH(date_key) = 20 AND ((YEAR(date_key) - 1) % 4) = 0 THEN 'Inauguration Day'
  WHEN MONTH(date_key) = 1 AND EXTRACT('dayofweek_iso', date_key) = 1 AND ordinal_weekday_of_month = 3 THEN 'Martin Luther King Jr Day'
  WHEN MONTH(date_key) = 2 AND DAYOFMONTH(date_key) = 14 THEN 'Valentine''s Day'
  WHEN MONTH(date_key) = 2 AND EXTRACT('dayofweek_iso', date_key) = 1 AND ordinal_weekday_of_month = 3 THEN 'President''s Day'
  WHEN MONTH(date_key) = 3 AND DAYOFMONTH(date_key) = 17 THEN 'Saint Patrick''s Day'
  WHEN MONTH(date_key) = 5 AND EXTRACT('dayofweek_iso', date_key) = 7 AND ordinal_weekday_of_month = 2 THEN 'Mother''s Day'
  -- Memorial Day: last Monday in May
  WHEN MONTH(date_key) = 5
       AND EXTRACT('dayofweek_iso', date_key) = 1
       AND DATEADD(day, 7, date_key) > LAST_DAY(date_key, 'Month') THEN 'Memorial Day'
  WHEN MONTH(date_key) = 6 AND EXTRACT('dayofweek_iso', date_key) = 7 AND ordinal_weekday_of_month = 3 THEN 'Father''s Day'
  WHEN MONTH(date_key) = 7 AND DAYOFMONTH(date_key) = 4 THEN 'Independence Day'
  WHEN MONTH(date_key) = 9 AND EXTRACT('dayofweek_iso', date_key) = 1 AND ordinal_weekday_of_month = 1 THEN 'Labor Day'
  WHEN MONTH(date_key) = 10 AND EXTRACT('dayofweek_iso', date_key) = 1 AND ordinal_weekday_of_month = 2 THEN 'Columbus Day'
  WHEN MONTH(date_key) = 10 AND DAYOFMONTH(date_key) = 31 THEN 'Halloween'  -- fixed
  WHEN MONTH(date_key) = 11 AND EXTRACT('dayofweek_iso', date_key) = 4 AND ordinal_weekday_of_month = 4 THEN 'Thanksgiving Day'
  WHEN MONTH(date_key) = 12 AND DAYOFMONTH(date_key) = 25 THEN 'Christmas Day'
  WHEN MONTH(date_key) = 12 AND DAYOFMONTH(date_key) = 26 THEN 'Boxing Day'
END AS holiday_desc