dbt v0.12.0
dbt v0.12.0 adds caching for some introspective queries on all adapters. Additionally, custom tags can be supplied for models, along with many other minor improvements and bugfixes. You can check out the release notes, or jump straight to the installation instructions. Read on for more info about the new features in this release
New stuff
Tags
The biggest new user-facing feature in this release is model tagging. Tags are a great way to logically group related models. Some good use cases for tagging are:
- tagging models by how frequently they should run
- tagging models by their data source
- tagging models that contain PII
Tags can be easily applied to a whole directory of models in dbt_project.yml
, or they can be set in the model sql file itself. The following examples show how tags can be used to make the scheduling of models easier.
# dbt_project.yml
# The following dbt_project.yml configures a project that looks like this:
# .
# βββ models
# βββ crm
# β βββ accounts.sql
# β βββ users.sql
# βββ ads
# β βββ campaigns.sql
# β βββ clicks.sql
# βββ events
# βββ events.sql
name: my_project
models:
tags: nightly
my_project:
ads:
tags: hourly
crm:
tags: hourly
Here, all models in the project are placed into the nightly
tag group, while only the models in crm/
and ads/
are given the hourly
tag. With these tags in place, you can run whole groups of models with a single model selector. Given this configuration, you could deploy your dbt models with:
hourly job
dbt run --model +tag:hourly+
nightly job
dbt run --model +tag:nightly+
Tagging is really convenient for the deployment of dbt models, but it can also be a great way to understand the flow of data through your dbt DAG. The dbt documentation website now provides a list of available tags for easy filtering. Further, the --models
selector input accepts the tag:
selector syntax. In practice, it looks something like this:
You can find the docs for model tagging here.
Caching
The very short version of caching is: dbt will run faster now because dbt wonβt execute certain introspective queries anymore. This speedup will be most noticeable on Snowflake and BigQuery, but Redshift users may also notice a speedup. Thereβs a great thread on this topic here if youβre interested in learning more about the implementation / theory behind dbtβs take on caching.
Granting usage with custom schemas
If youβre using custom schemas with Redshift, Postgres, or Snowflake, chances are you have post-hook
s that look something like:
models:
my_project:
post-hook:
- grant usage on {{ this.schema }} to db_reader
- grant select on {{ this }} to db_reader
While the grant select...
statement is very reasonable, the grant usage...
statement is both redundant and error-prone.
First, it doesnβt make a ton of sense to grant usage on the same schema to the same user once for each model. The paradigm shown above will spend a nonzero amount of time uselessly re-running the same grant usage
statement with no real benefit! More troublingly, Redshift can raise an error like Table dropped by concurrent transaction
if a grant usage
statement is running at the same time as a drop table
statement within the same schema. This isnβt an issue if your models are all built into a single schema (you can just use target.schema
in the on-run-end
hook), but custom schemas mean that you might need to grant usage to many schemas.
To fix this, dbt v0.12.0 adds a list of schemas to the on-run-end
compilation context. As a result, you can still run grant select...
statements on individual models in post-hooks
, but you can reserve permissioning usage on specific schemas until the end of the run. You can find docs on the schemas
context variable here, and a quick example is shown below:
on-run-end:
- "{% for schema in schemas %} grant usage on schema {{ schema }} to db_reader; {% endfor %}"
models:
my_project:
post-hook:
- grant select on {{ this }} to db_reader;
Iβd recommend wrapping that logic up into a macro β thereβs an example of that here
Breaking Changes
Support for the repositories:
block in dbt_project.yml
(deprecated in 0.10.0) was removed. Use the packages.yml
file instead (docs)
Thanks!
Finally, thanks to the folks who contributed to this release!
- @mikekaminsky (#1049, #1060)
- @joshtemple (#1079)
- @k4y3ff (#954)
- @elexisvenator (#1019)
- @clrcrl (#725)
If youβre interested in getting involved, drop us a line on slack or get in touch in the issues!