Hello - dbt newbie here asking if there is a feature in the platform to help us manage multiple (simultaneous) versions of database objects where schemas change over time.
Initially (v1.0 of data product) we have “modelX” with 6 columns. Let’s call it modelX_1.0.
Later, the team adds two more columns, and we want to serialize it into modelX_2.0.
We would want a user in the database or client accessing from BI layer or in code to automatically get the latest version if they try to access the name “modelX” directly. But if they access a specific version (e.g., “select * from modelX_1.0” it should go to that appropriate version.
Is there a way to have dbt facilitate this behavior (custom schemas?), or am I looking for a supplemental tool or methodology? Any help greatly appreciated.
2 Likes
I love this question and line of thinking! I’ve been noodling on a similar thing in the back of my head but haven’t written anything down yet.
Today, dbt doesn’t have a model versioning system, but you can use exposures to document which models power which dashboards so you know where you’ll need to change things.
In the future, I think the dbt Semantic Layer might make this possible. It enables you to use full dbt code (including macros like ref
) inside of your supported BI tool, notebook, etc. What that would mean is that you could override the built-in ref
macro, or build a different macro, like this:
--pseudocode...
{% macro version_aware_ref(model_name, version=None) %}
{% if version is None %}
{% do return(ref(model_name)) %}
{% else %}
{% do return(ref(model_name_v ~ version)) %}
{% endif %}
{% endmacro %}
You could already do today, but it wouldn’t be accessible from your BI layer, only from inside the dbt project. The Semantic Layer is what would unlock the full potential.
What do you think?
1 Like