Creating a "model version" column in your models using md5

dbtscott · June 10, 2021, 4:21pm

I was faced with an interesting question recently, which I will paraphrase as:

“Is there a way for me to have a column in my views/tables that changes every time the model is modified without me having to remember to update a version number in the code?”

This could be potentially useful if you wanted to be able to understand if the version of a table being queried by a dashboard is the same as previous versions AND could be used in something like a datagroup sql_trigger in Looker to trigger a refresh when a dependent table is updated as a result of a model change.

Using the model Jinja function in dbt the answer is YES but with some caveats:

raw_sql is pre-Jinja rendering/compilation – so even if the Jinja renders something different, the raw_sql won’t change. That also means a change to the definition of a macro your calls won’t be reflected even though different SQL might be generated.
Relatively minor changes that result in the SAME sql will result in a model version change. So if you adjust the whitespace in your model file

{% set model_sql = model %}
with

dbt_model_version as (
  SELECT MD5('{{ model_sql.raw_sql | e }}') as _dbt_model_version
),

customers as (
  select * from {{ source('jaffle_shop','customers') }}
),

my_model as (
    select 
        id as customer_id,
        first_name,
        last_name,
        first_name || ' ' || last_name as full_name
    from customers
)

select * from my_model
CROSS JOIN dbt_model_version

A few additional notes:

I wrote and tested this code on Snowflake. It should work on other dialects, but I would encourage you to test and report back.
Wait…CROSS JOIN…really? My original version inserted the MD5('{{ my_sql.raw_sql | e }}') as _dbt_model_version expression into the my_model part of the query. After some questions from colleagues, I elected not to do that as:
- the performance characteristics were effectively identical and,
- it makes this much more copy/paste friendly as you can add code to the very beginning and very end of your models without really modifying your model code itself vs. having to get the MD5 expression inserted into exactly the right place.

Questions? Feedback? Ideas for derivative works? Join in the discussion by commenting below!

gnilrets · June 10, 2021, 5:59pm

Neat! I could see applications of this for auditing dbt snapshots of models so you could track how the data might have changed in relation to the code that generated it in addition to just the source data stream. This would be extra useful in this case if you could tie those hashes to a git commit sha, or maybe just another table that stored a version of the sql text along with the md5 hash.

Topic		Replies	Views
DBT Model Versioning - Versioning metrics calculations In-Depth Discussions dbt-core	0	792	January 23, 2024
dbt support for schema-versioned database objects? Help snapshots , best-practice , custom-schema	1	1654	January 17, 2023
Regarding dbt model versioning Help	0	985	October 13, 2023
Automatically add new column to incremental (or other type) Archive	0	2224	December 18, 2021
dbt-core dbt-snowflake upgrade issues v0.21 to v1.8 Help jinja , snowflake , dbt-core	0	11	July 10, 2025

Creating a "model version" column in your models using md5

Related topics