[Pre] v0.19.0 (Kiyoshi Kuromiya)

[Jan 14] v0.19.0-rc2 is available. It includes a fix and a few under-the-hood changes on top of RC1.
[Jan 05] v0.19.0-rc1 is available for prerelease testing.

Happy new year, all! A release candidate of dbt v0.19.0 (Kiyoshi Kuromiya) is now available on PyPi, Homebrew, DockerHub, and dbt Cloud.

Below, I’ll give an overview of the biggest changes since v0.18. I’d also encourage you to read:

  • Changelog for the full set of features, fixes, and under-the-hood tweaks
  • Prerelease docs for an overview of new and changed documentation

Give this RC a spin, and let us know what you find by responding below and posting in the #prereleases channel. Barring any show-stopping bugs, we expect to release the final version in two weeks’ time.

Installation:

# with pip
pip install --upgrade dbt==0.19.0rc1

# with homebrew
brew install dbt@0.19.0-rc1
brew link --overwrite dbt@0.19.0-rc1

Gently breaking changes

We don’t expect these to require action in most projects.

Artifacts (docs)

We’ve made changes to all JSON artifacts that dbt produces, starting with the addition of a metadata dictionary. For the first time, we are version-controlling and documenting them in detail. A full JSONSchema of each versioned artifact will always be available in at schemas.getdbt.com. Check out each new v1 schema:

Older JSONSchemas of the four artifacts (as of v0.18.1) are hosted at the same site as v0. Note that these are not official versions, but they may be helpful if you need to migrate existing code.

Why do this now? Artifacts have become increasingly important: they power new dbt features (such as Slim CI), and enable integrations with the wider data ecosystem. By establishing these contracts now, we want you to feel confident that wraparound workflows will not break at a moment’s notice. So, go for it: calculate documentation coverage from the manifest, identify bottlenecking models within the run results, track table size via the catalog. If you can parse JSON, you can do it.

Update from v0.18: Slim CI (docs)

The introduction of two beta features in dbt v0.18.0, --state and --defer, enabled a powerful new workflow in CI: build only the models that have changed since the last prod run (state:modified), and save time by selecting from their unmodified parents in prod (--defer).

dbt v0.19.0 includes two substantive changes that make Slim CI even better:

  1. Slightly smarter state:modified: dbt now stores the unrendered version of Jinja expressions used to set configs in dbt_project.yml. If you have expressions that return different results based on the target, dbt previously marked those as modifications. Now, it’s a little bit smarter at detecting what’s a real change as the result of development.

  2. Subtle tweak to --defer. Previously, this worked as a binary: either you were running a model, or you were referencing it from the state manifest. This was simple as an initial implementation, with its fair share of edge cases. We’ve dialed back deferral to work instead as a “fallback” mechanism: If you need to select from a model, and it doesn’t exist in your schema, dbt will instead look for it in the other manifest’s namespace. If it does exist in your schema, great! No need to defer.

    This subtle change fixes edge cases around seeds and model re-runs. It also enables us to support deferral for tests, too, which should lighten the burden on complex node selection logic in CI job definitions.

    What’s the drawback? You could use --defer as a way to reliably a downstream model in your dev or CI schema, while reliably selecting from production references. Now, if those references do exist in your scratch schema, dbt will use them instead. You can simply drop them (or drop and recreate your schema) to replicate the original behavior.

All in all, Slim CI is more powerful, better documented, and more intuitive. For now, it is still a preview feature in dbt Cloud—contact support if you’re interested.

Deprecations

  • After being deprecated in v0.17.0, config-version: 1 specifications of dbt_project.yml are no longer supported. See the v0.17.0 migration guide for details.

Notable non-breaking changes

  • Snapshots now offer first-class support for capturing hard-deleted records via an optional config, invalidate_hard_deletes. If a unique key disappears from the snapshot query, the snapshot will update dbt_valid_to; if it reappears, the snapshot will add a new record.
  • YAML selectors now support a description attribute, and they appear in manifest.json. (Support in the dbt-docs DAG viz coming soon.)
  • The re python module is now available to Jinja templating code—within macros, models, wherever—enabling much more complex regex logic. (Of course, if you need regular expressions for data transformation, use SQL!)

Some BigQuery-specific additions:

  • Partitioning tables by hour, month, or year via a granularity config
  • New token-based connection methods support OAuth in dbt Cloud and other deployments.
  • Traditional oauth connections (using gcloud) will use your default configured project, instead of raising an error, if none is specified in profiles.yml. This gives dbt-bigquery the distinction of having the most concise profile possible:
my-bigquery:
  outputs:
    dev:
      type: bigquery
      method: oauth
      dataset: dev_jerco
  target: dev

Fixes!

  • Redshift get_columns_in_relation performs better with external tables than in v0.18.1, thanks to a one-line fix
  • Postgres model names can be ≤51 characters long (up from 34) without fear of silent truncation
  • You can use doc blocks inside of exposure descriptions

Under the hood

  • Updated dependencies for Google and Snowflake libraries
  • Unofficial support for Python 3.9. dbt-core and most plugins can run in py39 environments. (dbt-snowflake cannot.) We’ll declare official support in a future release, once all plugin dependencies are compatible.

Next up!

Performance

We know that dbt takes too long to parse big projects today. The “dead time” between typing dbt run and seeing the first model execute is especially painful because there’s no way to get around it: you experience it whether you’ve selected to run one model or a thousand.

The v0.19.0 release introduces a new command, dbt parse, that will parse your project and produce a file with detailed timing info (target/perf_info.json). We’re planning to follow up soon with a v0.19 performance release: changes that will, we believe, reduce parse time by half in large projects.

That’s just a starting point. We’ll be devoting significant time and energy in 2021 to rewriting the slowest parts of dbt from the ground up. In the long run, we want all projects to parse in seconds, not minutes. If you’re interested in early access to alpha and beta versions of performance releases, send me a message—we’d love to have your help.

v0.20

The next minor version will be all about tests. Check out:

In the process, we’re hoping to resolve some inconsistencies that should get us well on our way to v1.0 later this year. Happy 2021 :slight_smile:

1 Like