Release: dbt Core v0.21 (Louis Kahn)

jerco · September 20, 2021, 7:34pm

Updates

[Sep 27] v0.21.0 (final) is available for production use.
[Sep 27] v0.21.0-rc2 is available. It includes small bug fixes and bumps to schema versions for changed metadata artifacts.
[Sep 20] v0.21.0-rc1 is available for prerelease testing.

Who is Louis Kahn ? Check out the release notes for a biography of this famous Philadelphian

dbt Core v0.21 (Louis Kahn) is now available on PyPi, Homebrew, DockerHub, and dbt Cloud.

Our two big areas of focus for this release were a new build task and reconciling configs and properties. I’ll say more about both below. That said, there are a bunch of other very cool features, including:

on_schema_change for incremental models: goodbye manual post-merge migrations, goodbye ad hoc full refreshes
state:modified: upstream macro changes! sub-selectors!
dbt deps keeping you in the know about new package versions

There’s much more where that came from. I’d encourage you to read:

Migration guide for an overview of new and changed documentation
Release notes and changelog for the full set of features, fixes, and under-the-hood tweaks

Installation

# with pip, install a specific adapter
pip install dbt-<adapter>==0.21.0rc2

# with Homebrew, install four original adapters
brew install dbt@0.21.0-rc2

Heads up: We will be changing some installation details for the next version of dbt (the one after v0.21). Going forward, we will no longer be supporting pip install dbt. Please ensure that you’ve switched any production pipelines to pip install dbt-<adapter>.

Breaking changes

Note that this release includes breaking changes for:

Freshness checking: The CLI command has been renamed to dbt source freshness, and its selection syntax now works like other tasks.
- Backwards compatible: The old name (source snapshot-freshness) was lengthy and easy to confuse with snapshot. The old command name will continue working, but it will no longer be documented.
- NOT backwards compatible: The previous selection syntax allowed you to select specific sources by name without the source: prefix, which is how standard selection syntax works. If your deployments select specific sources to freshness check, you must add the source: prefix.
Snowflake: Remove most transactional logic and turn on autocommit by default. We believe this should significantly reduce Cloud Services credit consumption in standard dbt operations.
Artifacts (see schemas.getdbt.com):
- manifest.json has a new v3 schema that includes additional node properties (no changes to existing properties)
- run_results.json has a new v3 schema that includes skipped as a potential TestResult
- sources.json has a new v2 schema that adds timing and thread details.

One notable (non-breaking!) change

All dbt tasks now use --select instead of --models to select resources. Tasks that previously used --models (run, test, compile, docs generate, list) have preserved the old behavior for backwards compatibility.

New task: `build`

https://next.docs.getdbt.com/reference/commands/build

What does dbt build do? Well, everything: it runs your models, tests your tests, snapshots your snapshots, and seeds your seeds. It does this, resource by resource, from left to right across your DAG.

In DAG order: it’s worth repeating! If you previously struggled to deploy a dbt project that mixes models and snapshots throughout, this is the task for you.

Tests on an upstream model will block downstream models from running. If any test fails, the downstream models will be skipped. Why? The answer won’t surprise you: We think test failures matter—enough to stop a DAG for.

Geoffrey: You fool! As if it matters how one test fails.
Richard: When the failure’s all that’s left, it matters.

If there are tests in your project that aren’t worth stopping for, that’s totally ok—that’s just what test severity is good for. You can configure those tests with error_if thresholds (“only stop if you find >100 failures”), or to warn always and keep dbt a-buildin’.

How will you `build`?

Consider that:

In development, dbt build --select model_a will both run and test model_a. (We reworked test selection in v0.20 to avoid surprises, by making sure this syntax doesn’t include tests with other unselected, unbuilt parents.)
In CI, your build-on-PR job could be as simple as dbt build -s state:modified+ (plus --state, --defer, and a production manifest).
In production, your regularly scheduled job could be dbt build, plus steps to check source freshness and generate documentation.
dbt build works with all the powerful selection syntax you’ve come to know and love— including yaml selectors, a potent, version-controlled way to define subsets of your DAG. Also new in v0.21: the ability to define default yaml selectors, thereby offering custom control over the “full build” experience (i.e. dbt build without --select or --exclude).

dbt build is an opinionated task. It’s the culmination of all we’ve built—running models with resilient materializations, prioritizing data quality with tests, updating fixtures with seeds, capturing slowly changing dimensions with snapshots—all for one DAG, and one DAG for all.

We think you should use build, but you don’t have to—all the existing tasks are still there to mix and match.

Configs and properties

https://next.docs.getdbt.com/reference/configs-and-properties

Previously, we had entire sections of the dbt documentation dedicated to explaining the difference between resource configs and resource properties. It can be hard to remember which is which, and the distinctions are not minor: they’re defined in different places, and configs carry a lot of additional functionality.

Student: Why is database a config for models (settable in Jinja config() and dbt_project.yml), but a property for sources (settable only in yaml files that aren’t dbt_project.yml)?
Teacher: Well, you see, configs tell dbt how to do something, whereas properties tell dbt about what something is. dbt creates models, so model location is a how; dbt knows about sources, so source location is a what.
Student: Ok - what about persist_docs, which is a config? That uses description, which is a property???
Teacher: Well, you see, one property of configurability is that it’s contagious, like a child’s laughter, and so the fact of persist_docs being a config raises description, as it were, from being a measly property into a config-plus-one, through an alchemical process that our greatest researchers are only beginning to understand…

Okay, okay. So, what’s the change? You can now set resource configs in all yaml files, using a new config property. Using that property, you can set configurations, just as you can with the in-file config() macro or in dbt_project.yml. This is our initial stab at reconciling two different ways to apply significant attributes to models, seeds, snapshots, and tests.

Examples

The big change here is a conceptual one. There are also some specific changes you can make in your projects right now:

Configure column types for one seed

Previously, you could only do this in dbt_project.yml, with fairly wonky syntax. Now, you can:

# seeds/my_seed.yml
version: 2
seeds:
  - name: my_seed
    config:
      column_types: {my_date_field: date}

Set meta as a project config, then override it

# dbt_project.yml
models:
  +meta:
    owner: data_team
    important: true

Override it for one model in its yaml properties, or right in its .sql file:

# models/my_specific_model.yml
version: 2
models:
  - name: my_specific_model
    config:
      meta:
        owner: me # overrides
        contains_pii: yes # net-new
        # inherits `important` from project-level config

# models/my_specific_model.sql
{{ config(meta = {'owner': 'me', 'contains_pii': 'yes'}) }}
select ...

Note that this change is backwards compatible, so existing meta definitions will keep working. If you want to start using config inheritance, you’ll need to switch meta from a top-level key to nest it under a config block.

Limitations

This was a big first step; the work is never done. Some properties are not yet configs, and so lack those capabilities:

Properties of sources + exposures. I’d love to support setting database in a sources key in dbt_project.yml. That’s still not possible, unfortunately, but we’ve taken a big step in the direction of making it so.
Special properties, such as description, tests, columns. These have different rendering contexts, or are responsible for creating new nodes (!). These would be much trickier to implement, and we’ll need to revisit in the future.

There are going to be wrinkles and limitations that we’ll discover, and iteratively improve, over time. It’s a first big attempt at reconciliation ahead of locking in dbt-core interfaces later this year. Let us know what you think

What’s ahead

The next minor version of dbt Core, after v0.21, will not be v0.22 — it will be v1.0. That means:

Specific changes to the ways you install dbt Core + adapter plugins
More consistent, intuitive ways to use and interface with dbt-core
Clarity about which pieces of dbt-core are “locked in,” and which things can change in minor versions post-v1.0

Excited? Questions? Stay tuned: there’s more coming soon, and to a Coalesce near you.

Topic		Replies	Views
Release: dbt Core v1.0 (W. E. B. Du Bois) Archive	2	8343	November 2, 2021
Release: dbt v0.13.0 Archive	0	2978	March 22, 2019
Release: v0.19.0 (Kiyoshi Kuromiya) Archive	1	4810	January 28, 2021
Release: dbt v0.17.0 Archive	1	4494	June 8, 2020
Release: dbt v0.16.0 Archive	2	3309	March 23, 2020