Release dbt v0.12.0


#1

dbt v0.12.0

dbt v0.12.0 adds caching for some introspective queries on all adapters. Additionally, custom tags can be supplied for models, along with many other minor improvements and bugfixes. You can check out the release notes, or jump straight to the installation instructions. Read on for more info about the new features in this release :slight_smile:

New stuff

Tags

The biggest new user-facing feature in this release is model tagging. Tags are a great way to logically group related models. Some good use cases for tagging are:

  • tagging models by how frequently they should run
  • tagging models by their data source
  • tagging models that contain PII

Tags can be easily applied to a whole directory of models in dbt_project.yml, or they can be set in the model sql file itself. The following examples show how tags can be used to make the scheduling of models easier.

# dbt_project.yml

# The following dbt_project.yml configures a project that looks like this:
# .
# └── models
#     β”œβ”€β”€ crm
#     β”‚   └── accounts.sql
#     β”‚   └── users.sql
#     β”œβ”€β”€ ads
#     β”‚   └── campaigns.sql
#     β”‚   └── clicks.sql
#     └── events
#         └── events.sql

name: my_project

models:
    tags: nightly
    my_project:
      ads:
          tags: hourly
      crm:
          tags: hourly

Here, all models in the project are placed into the nightly tag group, while only the models in crm/ and ads/ are given the hourly tag. With these tags in place, you can run whole groups of models with a single model selector. Given this configuration, you could deploy your dbt models with:

hourly job

dbt run --model +tag:hourly+

nightly job

dbt run --model +tag:nightly+

Tagging is really convenient for the deployment of dbt models, but it can also be a great way to understand the flow of data through your dbt DAG. The dbt documentation website now provides a list of available tags for easy filtering. Further, the --models selector input accepts the tag: selector syntax. In practice, it looks something like this:

You can find the docs for model tagging here.

Caching

The very short version of caching is: dbt will run faster now because dbt won’t execute certain introspective queries anymore. This speedup will be most noticeable on Snowflake and BigQuery, but Redshift users may also notice a speedup. There’s a great thread on this topic here if you’re interested in learning more about the implementation / theory behind dbt’s take on caching.

Granting usage with custom schemas

If you’re using custom schemas with Redshift, Postgres, or Snowflake, chances are you have post-hooks that look something like:

models:
  my_project:
    post-hook:
      - grant usage on {{ this.schema }} to db_reader
      - grant select on {{ this }} to db_reader

While the grant select... statement is very reasonable, the grant usage... statement is both redundant and error-prone.

First, it doesn’t make a ton of sense to grant usage on the same schema to the same user once for each model. The paradigm shown above will spend a nonzero amount of time uselessly re-running the same grant usage statement with no real benefit! More troublingly, Redshift can raise an error like Table dropped by concurrent transaction if a grant usage statement is running at the same time as a drop table statement within the same schema. This isn’t an issue if your models are all built into a single schema (you can just use target.schema in the on-run-end hook), but custom schemas mean that you might need to grant usage to many schemas.

To fix this, dbt v0.12.0 adds a list of schemas to the on-run-end compilation context. As a result, you can still run grant select... statements on individual models in post-hooks, but you can reserve permissioning usage on specific schemas until the end of the run. You can find docs on the schemas context variable here, and a quick example is shown below:

on-run-end:
 - "{% for schema in schemas %} grant usage on schema {{ schema }} to db_reader; {% endfor %}"

models:
  my_project:
    post-hook:
      - grant select on {{ this }} to db_reader;

I’d recommend wrapping that logic up into a macro – there’s an example of that here :slight_smile:

Breaking Changes

Support for the repositories: block in dbt_project.yml (deprecated in 0.10.0) was removed. Use the packages.yml file instead (docs)

Thanks!

Finally, thanks to the folks who contributed to this release!

If you’re interested in getting involved, drop us a line on slack or get in touch in the issues!