Release: dbt v0.15.0

drew · November 25, 2019, 9:10pm

dbt v0.15.0 - Louisa May Alcott

Who is Louisa May Alcott, you ask? Check out the release notes for a biography of this famous Philadelphian

dbt v0.15.0 (codenamed Louisa May Alcott) introduces exciting net-new functionality, bug fixes, and some helpful improvements to the dbt workflow. The most meaningful additions to dbt include a full-fledged dbt server, tools for monitoring production deployments, and improved support for database-specific features on Redshift, Snowflake, and BigQuery.

There are some breaking changes to be aware of in this release:

Python 2.x is no longer supported
Compilation issues generated in .yml files now result in errors instead of warnings
The table_name field has been removed from Relations
Custom materializations must now manage dbt’s Relation cache

For a full list of changes in this release, please consult the release notes.

Installation notes

dbt v0.15.0 uses the psycopg2-binary dependency (instead of psycopg2) to simplify installation on platforms that do not have a compiler toolchain installed. If you experience segmentation faults, crashes, or installation errors, you can set the DBT_PSYCOPG2_NAME environment variable to psycopg2 to change the dependency that dbt installs. This may require a compiler toolchain and development libraries.

$ DBT_PSYCOPG2_NAME=psycopg2 pip install dbt

You may also install specific dbt plugins directly by name. This has the advantage of only installing the Python requirements needed for your particular database:

$ pip install dbt-postgres
$ pip install dbt-redshift
$ pip install dbt-snowflake
$ pip install dbt-bigquery

Some selected highlights from the changelog to follow:

A dbt server

The dbt server was introduced in dbt v0.14.0 and it was overhauled for v0.15.0. You can now compile and execute SQL in the context of your dbt project, build models, tests, snapshots, and seeds, and even generate documentation via HTTP requests made against the server. You can find more information about using the dbt server in the docs.

Monitor your dbt deployments

Structured logging

dbt v0.15.0 introduces structured JSON logging. You can send these structured logs to the monitoring and alerting tools your team already uses, including DataDog, Cloudwatch, and more. To enable JSON logging, set the --log-format flag to json as shown in the documentation.

When used in conjunction with the --debug flag, the stdout of dbt is ready to pipe into the monitoring tool of your choice.

Query Comments

In addition to realtime monitoring and alerting, dbt v0.15.0 better supports rich analysis of dbt performance over time. dbt now injects configurable query comments containing metadata about the queries executed by dbt. You can consume these query comments by parsing them out of your database’s query history tables, or by using a dedicated tool like intermix.io. intermix.io automatically captures and stores dbt query comments and metadata so you can monitor dbt project performance characteristics down to the individual model.

Faster dbt runs with partial parsing

Every time dbt runs, it reads all of the files in your project directory and parses them into a representation called a manifest. If you have very many files in your project, you may have noticed that this step can be time consuming. dbt v0.15.0 introduces a feature called “partial parsing” which limits the files that dbt parses to only the ones that have changed since the last dbt invocation.

In typical development workflows, you might change one or two files at a time before doing a dbt run or test. As such, dbt will now only parse these changed files at the start of the run, deferring to the parsed representation from the previous run where possible. In a test project with 100 models and 400 tests, this reduces the parse time for the project from around 6 seconds to a couple hundred milliseconds.

In the example below, the --partial-parse flag knocks 5.5 seconds off the total dbt runtime. In practice, you’ll notice this change as dbt feeling “snappier” to get from pressing <enter> to seeing model results fly by your screen. You can find more information about enabling partial parsing for your project in the docs.

Atomic full refreshes

This one was a long time coming! Previously, incremental model builds were not atomic when the --full-refresh flag was supplied. This meant that queries from your BI tool (or similar) might not find the table they were looking for during a model rebuild. dbt’s incremental materializations are now 100% Grade A Atomic across Postgres, Redshift, Snowflake, and BigQuery.

Improvements to dbt Docs

Another one from the backlog - the autogenerated dbt Documentation website now includes information about seeds, snapshots, and custom schema tests. These seed and snapshot nodes show up in the DAG view with edges to the nodes that select from them. Further, sources, seeds, and snapshots will all render in the “database” model view.

Look forward to support for documenting macros and custom data tests in a future dbt release.

Snowflake updates

Snowflake virtual warehouses can now be configured on a per-model basis. This configuration makes it possible to use a big warehouse to build time consuming models, and a smaller warehouse to build quicker models. Check out the docs for detailed usage information.

Also in v0.15.0:

Support for copying grants
Support for secure views

BigQuery updates

In v0.15.0, dbt now uses the BigQuery INFORMATION_SCHEMA to generate a catalog for the dbt documentation site. This change addresses the performance issues that some users were seeing when they ran dbt docs generate against a dataset which contained very many date-sharded tables.

To see information about date shards in the documentation website, configure your date-sharded sources using the * glob syntax when specifying a source table.

version: 2

sources:
  - name: clickstream
    tables:
      - name: events_*

Redshift updates

The dist config for Redshift tables can now be configured with the value auto (docs).

You can now query geospatial data on Redshift. We didn’t do anything to support this, I just wanted to make sure that you knew it was newly supported in Redshift . Docs here.

Thanks to our contributors!

If you’re interested in working on a feature in the dbt backlog, check out the Contributing Guide and drop us a line on Slack! Thanks to the following contributors who submitted PRs for the 0.15.0 release

Topic		Replies	Views
Pre-release: v0.16.0 (Barbara Gittings) Archive	13	5907	March 18, 2020
Release: dbt v0.16.0 Archive	2	3307	March 23, 2020
Release: dbt v0.17.0 Archive	1	4489	June 8, 2020
Release: dbt v0.14.1 Archive	0	2452	September 4, 2019
Prerelease: v0.18.0 (Marian Anderson) Archive	3	5010	September 14, 2020