Release: v0.19.0 (Kiyoshi Kuromiya)

jerco · January 5, 2021, 4:47pm

Updates

[Jan 27] v0.19.0 (final) is now available.
[Jan 14] v0.19.0-rc2 is available. It includes a fix and a few under-the-hood changes on top of RC1.
[Jan 05] v0.19.0-rc1 is available for prerelease testing.

Who is Kiyoshi Kuromiya ? Check out the release notes for a biography of this famous Philadelphian

Happy new year, all! dbt v0.19.0 (Kiyoshi Kuromiya) is now available on PyPi, Homebrew, DockerHub, and dbt Cloud.

Below, I’ll give an overview of the biggest changes since v0.18. I’d also encourage you to read:

Changelog for the full set of features, fixes, and under-the-hood tweaks
Migration guide for an overview of new and changed documentation

Installation:

# with pip
pip install --upgrade dbt==0.19.0

# with homebrew
brew install dbt@0.19.0
brew link --overwrite dbt@0.19.0

Gently breaking changes

We don’t expect these to require action in most projects.

Artifacts (docs)

We’ve made changes to all JSON artifacts that dbt produces, starting with the addition of a metadata dictionary. For the first time, we are version-controlling and documenting them in detail. A full JSONSchema of each versioned artifact will always be available in at schemas.getdbt.com. Check out each new v1 schema:

Older JSONSchemas of the four artifacts (as of v0.18.1) are hosted at the same site as v0. Note that these are not official versions, but they may be helpful if you need to migrate existing code.

Why do this now? Artifacts have become increasingly important: they power new dbt features (such as Slim CI), and enable integrations with the wider data ecosystem. By establishing these contracts now, we want you to feel confident that wraparound workflows will not break at a moment’s notice. So, go for it: calculate documentation coverage from the manifest, identify bottlenecking models within the run results, track table size via the catalog. If you can parse JSON, you can do it.

Update from v0.18: Slim CI (docs)

The introduction of two beta features in dbt v0.18.0, --state and --defer, enabled a powerful new workflow in CI: build only the models that have changed since the last prod run (state:modified), and save time by selecting from their unmodified parents in prod (--defer).

dbt v0.19.0 includes two substantive changes that make Slim CI even better:

Slightly smarter state:modified: dbt now stores the unrendered version of Jinja expressions used to set configs in dbt_project.yml. If you have expressions that return different results based on the target, dbt previously marked those as modifications. Now, it’s a little bit smarter at detecting what’s a real change as the result of development.
Subtle tweak to --defer. Previously, this worked as a binary: either you were running a model, or you were referencing it from the state manifest. This was simple as an initial implementation, with its fair share of edge cases. We’ve dialed back deferral to work instead as a “fallback” mechanism: If you need to select from a model, and it doesn’t exist in your schema, dbt will instead look for it in the other manifest’s namespace. If it does exist in your schema, great! No need to defer.

This subtle change fixes edge cases around seeds and model re-runs. It also enables us to support deferral for tests, too, which should lighten the burden on complex node selection logic in CI job definitions.

What’s the drawback? You could use --defer as a way to reliably a downstream model in your dev or CI schema, while reliably selecting from production references. Now, if those references do exist in your scratch schema, dbt will use them instead. You can simply drop them (or drop and recreate your schema) to replicate the original behavior.

All in all, Slim CI is more powerful, better documented, and more intuitive. For now, it is still a preview feature in dbt Cloud—contact support if you’re interested.

Deprecations

After being deprecated in v0.17.0, config-version: 1 specifications of dbt_project.yml are no longer supported. See the v0.17.0 migration guide for details.

Notable non-breaking changes

Snapshots now offer first-class support for capturing hard-deleted records via an optional config, invalidate_hard_deletes. If a unique key disappears from the snapshot query, the snapshot will update dbt_valid_to; if it reappears, the snapshot will add a new record.
YAML selectors now support a description attribute, and they appear in manifest.json. (Support in the dbt-docs DAG viz coming soon.)
The re python module is now available to Jinja templating code—within macros, models, wherever—enabling much more complex regex logic. (Of course, if you need regular expressions for data transformation, use SQL!)

Some BigQuery-specific additions:

Partitioning tables by hour, month, or year via a granularity config
New token-based connection methods support OAuth in dbt Cloud and other deployments.
Traditional oauth connections (using gcloud) will use your default configured project, instead of raising an error, if none is specified in profiles.yml. This gives dbt-bigquery the distinction of having the most concise profile possible:

my-bigquery:
  outputs:
    dev:
      type: bigquery
      method: oauth
      dataset: dev_jerco
  target: dev

Fixes!

Redshift get_columns_in_relation performs better with external tables than in v0.18.1, thanks to a one-line fix
Postgres model names can be ≤51 characters long (up from 34) without fear of silent truncation
You can use doc blocks inside of exposure descriptions

Under the hood

Updated dependencies for Google and Snowflake libraries
Unofficial support for Python 3.9. dbt-core and most plugins can run in py39 environments. (dbt-snowflake cannot.) We’ll declare official support in a future release, once all plugin dependencies are compatible.

Next up!

Performance

We know that dbt takes too long to parse big projects today. The “dead time” between typing dbt run and seeing the first model execute is especially painful because there’s no way to get around it: you experience it whether you’ve selected to run one model or a thousand.

The v0.19.0 release introduces a new command, dbt parse, that will parse your project and produce a file with detailed timing info (target/perf_info.json). We’re planning to follow up soon with a v0.19 performance release: changes that will, we believe, reduce parse time by half in large projects.

That’s just a starting point. We’ll be devoting significant time and energy in 2021 to rewriting the slowest parts of dbt from the ground up. In the long run, we want all projects to parse in seconds, not minutes. If you’re interested in early access to alpha and beta versions of performance releases, send me a message—we’d love to have your help.

v0.20

The next minor version will be all about tests. Check out:

Initial milestone in GitHub
Recent Discourse thread about current capabilities and constraints for testing in dbt

In the process, we’re hoping to resolve some inconsistencies that should get us well on our way to v1.0 later this year. Happy 2021

NiallRees · January 28, 2021, 2:32pm

This is great - especially pleased that defer now properly supports dbt tests. Thanks @jerco

Topic		Replies	Views
Prerelease: v0.18.0 (Marian Anderson) Archive	3	5040	September 14, 2020
Release: dbt v0.17.0 Archive	1	4507	June 8, 2020
Release: v0.18.0 (Marian Anderson) Archive	2	3693	September 8, 2020
Faster dbt startup in v0.19.1 (beta) Archive	1	4453	February 15, 2021
Pre-release: v0.16.0 (Barbara Gittings) Archive	13	5954	March 18, 2020