dbt v0.14.0 - Wilt Chamberlain
dbt v0.14.0 improves the dbt workflow, adds Snapshots and Operations, and provides for finer-grained control over test failures. For a full list of changes in v0.14.0, check out the release notes. Read on for the highlights!
Improving the dbt Workflow
This is one where the demo says it all:
There’s a whole lot going on here, and there’s a whole lot more to come, too! With the 0.14.0 release of dbt, the Develop interface in dbt Cloud is now generally available. This interface can compile and run queries in the context of a dbt Cloud Environment. For more information on using dbt Develop, check out the docs.
The dbt Develop interface is powered by the brand new rpc server in dbt. This server loads up dbt projects into memory and then serves API endpoints that can compile and run SQL in the context of the project. With this rpc server, you can rapidly iterate on dbt code that references models and macros without needing to re-compile your whole project.
Snapshots
This release introduces Snapshots, a more capable version of dbt’s existing archive
functionality. Snapshots make it possible to capture changes to mutable datasets over time. As a part of rolling out Snapshots, dbt v0.14.0 drops support for Archives. There’s a handy migration guide that will help you upgrade your Archives to Snapshots. I promise, it’s worth it! Check out the benefits of Snapshots below.
Snapshotting Queries
Whereas Archives operated on tables, Snapshots operate on queries. You can use these queries to:
- Select a subset of columns in a table
- Filter out unimportant records from a table
- Generate surrogate keys or re-cast columns before snapshotting
Strategies
dbt ships with two built-in Snapshot strategies. These strategies provide different mechanisms for understanding when a rows has changed in a source table.
The default strategy, timestamp
, uses an updated_at timestamp column to check if a row has been updated since the last time a Snapshot was run.
Alternatively, the check
strategy will compare column values between old and new versions of a row to determine if the row has changed. While this is more computationally expensive than the timestamp
strategy, it is useful when an updated_at
column is not present or is otherwise unreliable.
Snapshot Example
Snapshots are defined in snapshot
blocks in .sql files. Check out an example below:
{% snapshot orders_snapshot %}
{{
config(
target_schema='snapshots',
strategy='check',
unique_key='id'
check_cols=['status', 'is_cancelled'],
)
}}
select * from {{ source('ecom', 'orders') }}
{% endsnapshot %}
Operations
The effective maintenance of a data warehouse sometimes requires manual or automated “operational” queries to be run. While these operations frequently vary by organization, they might include:
- Vacuuming tables
- Creating partitions
- Resuming/pausing/resizing warehouses
- Refreshing pipes
- And so on!
To support these use cases, dbt v0.14.0 provides a run-operation command. This command invokes a macro with arguments specified on the CLI. Inside of this macro, you can use dbt’s new run_query macro (or a statement
block) to run maintenance tasks against your warehouse.
{% macro vacuum_table(schema, table) %}
{% set relation = schema ~ "." ~ table %}
{% set sql %}
vacuum {{ relation }} to 95%;
{% endset %}
{{ log("Beginning vacuum of table: " ~ relation, info=true) }}
{% do run_query(sql) %}
{{ log("Done vacuuming table: " ~ relation, info=true) }}
{% endmacro %}
Run it with:
$ dbt run-operation vacuum_table --args '{"schema": "snowplow", "table": "event"}'
Running with dbt=0.14.0
Beginning vacuum of table: snowplow.event
Done vacuuming table: snowplow.event
What kinds of tasks can you automate with operations?
Test Severity
Did you know that you can use emojis to respond to GitHub issues? Did you know that this issue received 26 emojis? We’re super happy to release severity
configuration for schema tests in dbt v0.14.0. With test severity, you can denote a test failures as either a warning or an error. This helps reduce noise generated by flaky tests and ensures that test failures are actionable for dbt users.
version: 2
models:
- name: my_model
columns:
- name: id
tests:
- not_null:
severity: "WARN"
- unique:
severity: "ERROR"
Pro-tip: Use the --warn-error
flag to treat warnings as errors in PR builds
Thanks!
Last, thanks to the 13 contributors who submitted a PR to dbt for v0.14.0! Nice work!
- @nydnarb (#1363)
- @emilieschario (#1366)
- @bastienboutonnet (#1409)
- @kasanchez (#1247)
- @Blakewell (#1307)
- @buremba (#1476)
- @darrenhaken (#1285)
- @tbescherer (#1504)
- @heisencoder (#1509, #1549. #1578)
- @cclauss (#1572)
- @josegalarza (#1571)
- @rmgpinto (docs#31, docs#32)
- @groodt (docs#34)
If you’re interested in contributing for a future release of dbt, check out the contributing guide and drop us a line on Slack!