Release: dbt v0.14.0

dbt v0.14.0 - Wilt Chamberlain

dbt v0.14.0 improves the dbt workflow, adds Snapshots and Operations, and provides for finer-grained control over test failures. For a full list of changes in v0.14.0, check out the release notes. Read on for the highlights!

Improving the dbt Workflow

This is one where the demo says it all:

view the demo

There’s a whole lot going on here, and there’s a whole lot more to come, too! With the 0.14.0 release of dbt, the Develop interface in dbt Cloud is now generally available. This interface can compile and run queries in the context of a dbt Cloud Environment. For more information on using dbt Develop, check out the docs.

The dbt Develop interface is powered by the brand new rpc server in dbt. This server loads up dbt projects into memory and then serves API endpoints that can compile and run SQL in the context of the project. With this rpc server, you can rapidly iterate on dbt code that references models and macros without needing to re-compile your whole project.

Snapshots

This release introduces Snapshots, a more capable version of dbt’s existing archive functionality. Snapshots make it possible to capture changes to mutable datasets over time. As a part of rolling out Snapshots, dbt v0.14.0 drops support for Archives. There’s a handy migration guide that will help you upgrade your Archives to Snapshots. I promise, it’s worth it! Check out the benefits of Snapshots below.

Snapshotting Queries

Whereas Archives operated on tables, Snapshots operate on queries. You can use these queries to:

  1. Select a subset of columns in a table
  2. Filter out unimportant records from a table
  3. Generate surrogate keys or re-cast columns before snapshotting

Strategies

dbt ships with two built-in Snapshot strategies. These strategies provide different mechanisms for understanding when a rows has changed in a source table.

The default strategy, timestamp, uses an updated_at timestamp column to check if a row has been updated since the last time a Snapshot was run.

Alternatively, the check strategy will compare column values between old and new versions of a row to determine if the row has changed. While this is more computationally expensive than the timestamp strategy, it is useful when an updated_at column is not present or is otherwise unreliable.

Snapshot Example

Snapshots are defined in snapshot blocks in .sql files. Check out an example below:

{% snapshot orders_snapshot %}

    {{
        config(
          target_schema='snapshots',
          strategy='check',
          unique_key='id'
          check_cols=['status', 'is_cancelled'],
        )
    }}
    
    select * from {{ source('ecom', 'orders') }}
    
{% endsnapshot %}

Operations

The effective maintenance of a data warehouse sometimes requires manual or automated “operational” queries to be run. While these operations frequently vary by organization, they might include:

  • Vacuuming tables
  • Creating partitions
  • Resuming/pausing/resizing warehouses
  • Refreshing pipes
  • And so on!

To support these use cases, dbt v0.14.0 provides a run-operation command. This command invokes a macro with arguments specified on the CLI. Inside of this macro, you can use dbt’s new run_query macro (or a statement block) to run maintenance tasks against your warehouse.

{% macro vacuum_table(schema, table) %}

    {% set relation = schema ~ "." ~ table %}
    {% set sql %}
      vacuum {{ relation }} to 95%;
    {% endset %}
    
    {{ log("Beginning vacuum of table: " ~ relation, info=true) }}
    {% do run_query(sql) %}
    {{ log("Done vacuuming table: " ~ relation, info=true) }}
    
{% endmacro %}

Run it with:

$ dbt run-operation vacuum_table --args '{"schema": "snowplow", "table": "event"}'
Running with dbt=0.14.0
Beginning vacuum of table: snowplow.event
Done vacuuming table: snowplow.event

What kinds of tasks can you automate with operations?

Test Severity

Did you know that you can use emojis to respond to GitHub issues? Did you know that this issue received 26 :+1: emojis? We’re super happy to release severity configuration for schema tests in dbt v0.14.0. With test severity, you can denote a test failures as either a warning or an error. This helps reduce noise generated by flaky tests and ensures that test failures are actionable for dbt users.

version: 2

models:
  - name: my_model
    columns:
      - name: id
        tests:
          - not_null:
            severity: "WARN"

          - unique:
            severity: "ERROR"

Pro-tip: Use the --warn-error flag to treat warnings as errors in PR builds :slight_smile:

Thanks!

Last, thanks to the 13 contributors who submitted a PR to dbt for v0.14.0! Nice work!

If you’re interested in contributing for a future release of dbt, check out the contributing guide and drop us a line on Slack!

1 Like