dbt v0.16.0 - Barbara Gittings
Who is Barbara Gittings? Check out the release notes for a biography of this famous Philadelphian
dbt v0.16.0 overhauls dbt’s compilation contexts to make compilation more consistent, improves performance, and provides a whole bunch of highly requested functionality and helpful bugfixes .
There are some breaking changes to be aware of in this release. Note: most projects will not be impacted by these changes, but please read them carefully in case any apply to your usage of dbt!
Breaking changes
- Quirks with type inference for seed CSV files have been fixed, but may change the data loaded by the
dbt seed
process for your project in subtle ways. - BigQuery range bucket partitioning must now be configured with the new-style partitioning config
- Support for the one-argument variant of
generate_schema_name
has been dropped - Files with a
.yml
extension found in thedata/
,macros/
,analysis/
,tests/
, andsnapshots/
directories will now be parsed as schema.yml specifications - The accepted arguments of the
get_catalog
macro have changed - The signature of the
snowflake__list_schemas
macro has changed - dbt no longer supports building models in Snowflake databases with greater than 10,000 schemas
- Arguments to source schema test arguments were previously parsed in an inconsistent way, but they are now parsed in the same way as arguments to model schema tests
- The timestamp present in debug log lines is now rendered in a more standard format
- The
docrefs
key has been removed from themanifest.json
file
For a full list of changes in this release, please consult the release notes.
Installation notes
# With Homebrew
brew install dbt@0.16.0
brew link --overwrite dbt@0.16.0
# Or with pip
pip install --upgrade dbt==0.16.0
Some selected highlights from the changelog to follow:
A compilation context for dbt_project.yml
The models:
, snapshots:
, and seeds:
configs in the dbt_project.yml
file are now evaluated using a “base” compilation context. This means that you can reference variables, env vars, the selected target
, and other variables when configuring resources in your project. Here’s a quick example to give you an idea of what’s possible:
name: my_project
version: 1.0.0
# Configure models in the `models/marts` directory to build
# as tables in prod, or views in dev/CI/etc
models:
my_project:
marts:
materialized: "{{ 'table' if target.name == 'prod' else 'view' }}"
For more information on the dbt_project.yml
compilation context, check out the docs.
Document everything
Documentation can now be provided for:
- analyses
- custom data tests
- macros
- seeds
- snapshots
These resources can be configured in schema.yml
files in all of the places you would expect: macros/
, data/
, snapshots/
, analysis/
, and tests/
directories. Check out the docs on the schema.yml syntax for more information on documenting these resources, as well as usage information for some new documentation-oriented configs.
Some quick highlights:
- Resources can be hidden from the rendered documentation site using the
docs
config - Metadata can be provided for models using the
meta
config - Columns and column tests can be configured with tags using the
tags
config. These tags can be used to select specific tests to include or exclude using--models
and--exclude
selectors
BigQuery incremental model improvements
This one was a team effort – major shout outs are in order for everyone who contributed in the issue and the Pull Request (seriously - if you want to see open source development in action, check these out!).
dbt v0.16.0 ships with the ability to configure the incremental_strategy for BigQuery incremental models. Check out @jerco’s posts on using the incremental strategy config and benchmarking incremental performance for more information on how to use this powerful new feature.
Generating database names
With the addition of the generate_database_name macro, the triumvirate of generate_*_name
macros is now complete. In addition to dynamically generating the name of model aliases and schemas, the database that models are rendered into can now be configured with a macro. Check out the example below which renders models into a single database in dev and CI, but spreads models across different databases in prod:
{% macro generate_database_name(custom_database_name=none, node=none) -%}
{%- set default_database = target.database -%}
{%- if custom_database_name is none or target.name != 'prod' -%}
{{ default_database }}
{%- else -%}
{{ custom_database_name | trim }}
{%- endif -%}
{%- endmacro %}
Use it with:
-- models/my_model.sql
{{ config(database='marketing') }}
select *
from ....
Performance improvements
The following actions should feel noticeably faster, with performance lifts varying by database:
- Time to start running models (most noticeable on Snowflake and BigQuery)
- Time to generate docs with
dbt docs generate
These speed improvements are a function of 1) using smarter queries to fetch data from the information schema and 2) parallelizing queries to the information schema. Future releases of dbt will expand on the approach implemented in this release.
Thanks to these contributors!
If you’re interested in working on a feature in the dbt backlog, check out the Contributing Guide and drop us a line on Slack! Thanks to the following contributors who submitted PRs for the 0.16.0 release