dbt v0.15.0 - Louisa May Alcott
Who is Louisa May Alcott, you ask? Check out the release notes for a biography of this famous Philadelphian
dbt v0.15.0 (codenamed Louisa May Alcott) introduces exciting net-new functionality, bug fixes, and some helpful improvements to the dbt workflow. The most meaningful additions to dbt include a full-fledged dbt server, tools for monitoring production deployments, and improved support for database-specific features on Redshift, Snowflake, and BigQuery.
There are some breaking changes to be aware of in this release:
- Python 2.x is no longer supported
- Compilation issues generated in .yml files now result in errors instead of warnings
table_namefield has been removed from Relations
- Custom materializations must now manage dbt’s Relation cache
For a full list of changes in this release, please consult the release notes.
dbt v0.15.0 uses the
psycopg2-binary dependency (instead of
psycopg2) to simplify installation on platforms that do not have a compiler toolchain installed. If you experience segmentation faults, crashes, or installation errors, you can set the
DBT_PSYCOPG2_NAME environment variable to
psycopg2 to change the dependency that dbt installs. This may require a compiler toolchain and development libraries.
$ DBT_PSYCOPG2_NAME=psycopg2 pip install dbt
You may also install specific dbt plugins directly by name. This has the advantage of only installing the Python requirements needed for your particular database:
$ pip install dbt-postgres $ pip install dbt-redshift $ pip install dbt-snowflake $ pip install dbt-bigquery
Some selected highlights from the changelog to follow:
A dbt server
The dbt server was introduced in dbt v0.14.0 and it was overhauled for v0.15.0. You can now compile and execute SQL in the context of your dbt project, build models, tests, snapshots, and seeds, and even generate documentation via HTTP requests made against the server. You can find more information about using the dbt server in the docs.
Monitor your dbt deployments
dbt v0.15.0 introduces structured JSON logging. You can send these structured logs to the monitoring and alerting tools your team already uses, including DataDog, Cloudwatch, and more. To enable JSON logging, set the
--log-format flag to
json as shown in the documentation.
When used in conjunction with the
--debug flag, the stdout of dbt is ready to pipe into the monitoring tool of your choice.
In addition to realtime monitoring and alerting, dbt v0.15.0 better supports rich analysis of dbt performance over time. dbt now injects configurable query comments containing metadata about the queries executed by dbt. You can consume these query comments by parsing them out of your database’s query history tables, or by using a dedicated tool like intermix.io. intermix.io automatically captures and stores dbt query comments and metadata so you can monitor dbt project performance characteristics down to the individual model.
Faster dbt runs with partial parsing
Every time dbt runs, it reads all of the files in your project directory and parses them into a representation called a manifest. If you have very many files in your project, you may have noticed that this step can be time consuming. dbt v0.15.0 introduces a feature called “partial parsing” which limits the files that dbt parses to only the ones that have changed since the last dbt invocation.
In typical development workflows, you might change one or two files at a time before doing a dbt run or test. As such, dbt will now only parse these changed files at the start of the run, deferring to the parsed representation from the previous run where possible. In a test project with 100 models and 400 tests, this reduces the parse time for the project from around 6 seconds to a couple hundred milliseconds.
In the example below, the
--partial-parse flag knocks 5.5 seconds off the total dbt runtime. In practice, you’ll notice this change as dbt feeling “snappier” to get from pressing
<enter> to seeing model results fly by your screen. You can find more information about enabling partial parsing for your project in the docs.
Atomic full refreshes
This one was a long time coming! Previously, incremental model builds were not atomic when the
--full-refresh flag was supplied. This meant that queries from your BI tool (or similar) might not find the table they were looking for during a model rebuild. dbt’s incremental materializations are now 100% Grade A Atomic across Postgres, Redshift, Snowflake, and BigQuery.
Improvements to dbt Docs
Another one from the backlog - the autogenerated dbt Documentation website now includes information about seeds, snapshots, and custom schema tests. These seed and snapshot nodes show up in the DAG view with edges to the nodes that select from them. Further, sources, seeds, and snapshots will all render in the “database” model view.
Look forward to support for documenting macros and custom data tests in a future dbt release.
Snowflake virtual warehouses can now be configured on a per-model basis. This configuration makes it possible to use a big warehouse to build time consuming models, and a smaller warehouse to build quicker models. Check out the docs for detailed usage information.
Also in v0.15.0:
In v0.15.0, dbt now uses the BigQuery
INFORMATION_SCHEMA to generate a catalog for the dbt documentation site. This change addresses the performance issues that some users were seeing when they ran
dbt docs generate against a dataset which contained very many date-sharded tables.
To see information about date shards in the documentation website, configure your date-sharded sources using the
* glob syntax when specifying a source table.
version: 2 sources: - name: clickstream tables: - name: events_*
dist config for Redshift tables can now be configured with the value
You can now query geospatial data on Redshift. We didn’t do anything to support this, I just wanted to make sure that you knew it was newly supported in Redshift . Docs here.
Thanks to our contributors!
If you’re interested in working on a feature in the dbt backlog, check out the Contributing Guide and drop us a line on Slack! Thanks to the following contributors who submitted PRs for the 0.15.0 release