At GitLab, our data team loves dbt, not just for the power of the tool, but also for the power of the community. All too often, I find myself coming up with a hacky macro only to find that there’s already a solution for that in dbt-utils, a dbt package with utility functions that can be used across dbt projects. As @tmurphy wrote in 4 Examples of the power of open source analytics, “the actual code that you use for analytics isn’t your company’s competitive advantage”. In fact, most of us are doing very similar analyses- the retention framework is the same, whether you’re doing it in customer counts, dollar-weighted, or users in your SaaS product.
A couple of months ago, GitLab rolled out changes to our fiscal year, shifting from the traditional calendar year (January to December) to a new fiscal calendar which starts on February 1 and runs through January 31. As our writing guidelines outline, all references to quarters and years now mean fiscal quarter and fiscal year. All new metrics, not only going forward, but also historically, would need to be updated to fit our new fiscal model. This meant our existing date table, used for easily mapping dates to a myriad of values for analysis, was no longer going to be enough.
Gathering Requirements
Before just replacing our table, I started by understanding what values the table actually needed to have. We used to have ISO dates that were never used. Rather than just try to replicate what existed, I started by understanding what columns we actually needed to have in the table. Which fiscal values and calendar values did we need to support? Where was the last iteration of this table being used in our existing dbt models that I needed to worry about breaking or, at least, updating? Empowered with my list of requirements, I was ready to get starting.
Not reinventing the wheel
The most important part of a date table is the full series of dates that becomes the core of everything else. Rather than generate it myself, I relied on dbt util’s date spine macro, which I’ve used with great success before. I chose the starting date to be something well before our analysis began and ended it 40 years after today. We do use this table to do future analysis, but I thought 40 years from today would be further than our Finance team is doing any planning.
Customizing for our use-case
Once the date spine was in place as the core of the analysis, customize the date table to our needs was easy, as I was comfortable with our business logic.
Our fiscal year definition, for example, is:
CASE WHEN month_actual < 2 THEN year_actual ELSE (year_actual+1) END AS fiscal_year,
March 2019 is FY2020, and January 2020 is FY2020. January keeps its calendar year as its fiscal year, but for every other month, the fiscal year is one more than the calendar year.
I tested this by picking out a number of dates and writing out what the appropriate value for each of those dates should be before comparing the results. They weren’t perfect and it took a couple of tries to get it all right, but eventually we were there. We considered adding testing to the date table to be sure that we weren’t missing anything, but after concluding that any custom data tests would just spot-check what I manually checked we decided not to.
Documenting anyway
While this date model may appear to be self-documenting, we wanted to make it easier for our less technical users to understand what they were looking at and what column would hold which pieces of information. In our schema.yml
file, we define what the column holds and an example of what the value would be for a given date.
If you’ve got anything other than a traditional calendar as your fiscal year, I hope this can inspire your approach to a date model.
Our dbt docs where you can see more about our date model: https://gitlab-data.gitlab.io/analytics/dbt/snowflake/#!/model/model.gitlab_snowflake.date_details
All of our dbt models (and the git history for each of them!) is available online in our analytics project.