Updates
[Jan 27] v0.19.0 (final) is now available.
[Jan 14] v0.19.0-rc2 is available. It includes a fix and a few under-the-hood changes on top of RC1.
[Jan 05] v0.19.0-rc1 is available for prerelease testing.
Who is Kiyoshi Kuromiya ? Check out the release notes for a biography of this famous Philadelphian
Happy new year, all! dbt v0.19.0 (Kiyoshi Kuromiya) is now available on PyPi, Homebrew, DockerHub, and dbt Cloud.
Below, I’ll give an overview of the biggest changes since v0.18. I’d also encourage you to read:
- Changelog for the full set of features, fixes, and under-the-hood tweaks
- Migration guide for an overview of new and changed documentation
Installation:
# with pip
pip install --upgrade dbt==0.19.0
# with homebrew
brew install dbt@0.19.0
brew link --overwrite dbt@0.19.0
Gently breaking changes
We don’t expect these to require action in most projects.
Artifacts (docs)
We’ve made changes to all JSON artifacts that dbt produces, starting with the addition of a metadata
dictionary. For the first time, we are version-controlling and documenting them in detail. A full JSONSchema of each versioned artifact will always be available in at schemas.getdbt.com. Check out each new v1
schema:
- https://schemas.getdbt.com/dbt/manifest/v1.json
- https://schemas.getdbt.com/dbt/run-results/v1.json
- https://schemas.getdbt.com/dbt/catalog/v1.json
- https://schemas.getdbt.com/dbt/sources/v1.json
Older JSONSchemas of the four artifacts (as of v0.18.1) are hosted at the same site as v0
. Note that these are not official versions, but they may be helpful if you need to migrate existing code.
Why do this now? Artifacts have become increasingly important: they power new dbt features (such as Slim CI), and enable integrations with the wider data ecosystem. By establishing these contracts now, we want you to feel confident that wraparound workflows will not break at a moment’s notice. So, go for it: calculate documentation coverage from the manifest, identify bottlenecking models within the run results, track table size via the catalog. If you can parse JSON, you can do it.
Update from v0.18: Slim CI (docs)
The introduction of two beta features in dbt v0.18.0, --state
and --defer
, enabled a powerful new workflow in CI: build only the models that have changed since the last prod run (state:modified
), and save time by selecting from their unmodified parents in prod (--defer
).
dbt v0.19.0 includes two substantive changes that make Slim CI even better:
-
Slightly smarter
state:modified
: dbt now stores the unrendered version of Jinja expressions used to set configs indbt_project.yml
. If you have expressions that return different results based on thetarget
, dbt previously marked those as modifications. Now, it’s a little bit smarter at detecting what’s a real change as the result of development. -
Subtle tweak to
--defer
. Previously, this worked as a binary: either you were running a model, or you were referencing it from the state manifest. This was simple as an initial implementation, with its fair share of edge cases. We’ve dialed back deferral to work instead as a “fallback” mechanism: If you need to select from a model, and it doesn’t exist in your schema, dbt will instead look for it in the other manifest’s namespace. If it does exist in your schema, great! No need to defer.This subtle change fixes edge cases around seeds and model re-runs. It also enables us to support deferral for tests, too, which should lighten the burden on complex node selection logic in CI job definitions.
What’s the drawback? You could use
--defer
as a way to reliably a downstream model in your dev or CI schema, while reliably selecting from production references. Now, if those references do exist in your scratch schema, dbt will use them instead. You can simply drop them (or drop and recreate your schema) to replicate the original behavior.
All in all, Slim CI is more powerful, better documented, and more intuitive. For now, it is still a preview feature in dbt Cloud—contact support if you’re interested.
Deprecations
- After being deprecated in v0.17.0,
config-version: 1
specifications ofdbt_project.yml
are no longer supported. See the v0.17.0 migration guide for details.
Notable non-breaking changes
- Snapshots now offer first-class support for capturing hard-deleted records via an optional config,
invalidate_hard_deletes
. If a unique key disappears from the snapshot query, the snapshot will updatedbt_valid_to
; if it reappears, the snapshot will add a new record. - YAML selectors now support a
description
attribute, and they appear inmanifest.json
. (Support in the dbt-docs DAG viz coming soon.) - The
re
python module is now available to Jinja templating code—within macros, models, wherever—enabling much more complex regex logic. (Of course, if you need regular expressions for data transformation, use SQL!)
Some BigQuery-specific additions:
- Partitioning tables by hour, month, or year via a
granularity
config - New token-based connection methods support OAuth in dbt Cloud and other deployments.
- Traditional
oauth
connections (usinggcloud
) will use your default configured project, instead of raising an error, if none is specified inprofiles.yml
. This gives dbt-bigquery the distinction of having the most concise profile possible:
my-bigquery:
outputs:
dev:
type: bigquery
method: oauth
dataset: dev_jerco
target: dev
Fixes!
- Redshift
get_columns_in_relation
performs better with external tables than in v0.18.1, thanks to a one-line fix - Postgres model names can be ≤51 characters long (up from 34) without fear of silent truncation
- You can use
doc
blocks inside of exposuredescriptions
Under the hood
- Updated dependencies for Google and Snowflake libraries
- Unofficial support for Python 3.9.
dbt-core
and most plugins can run in py39 environments. (dbt-snowflake
cannot.) We’ll declare official support in a future release, once all plugin dependencies are compatible.
Next up!
Performance
We know that dbt takes too long to parse big projects today. The “dead time” between typing dbt run
and seeing the first model execute is especially painful because there’s no way to get around it: you experience it whether you’ve selected to run one model or a thousand.
The v0.19.0 release introduces a new command, dbt parse
, that will parse your project and produce a file with detailed timing info (target/perf_info.json
). We’re planning to follow up soon with a v0.19 performance release: changes that will, we believe, reduce parse time by half in large projects.
That’s just a starting point. We’ll be devoting significant time and energy in 2021 to rewriting the slowest parts of dbt from the ground up. In the long run, we want all projects to parse in seconds, not minutes. If you’re interested in early access to alpha and beta versions of performance releases, send me a message—we’d love to have your help.
v0.20
The next minor version will be all about tests. Check out:
- Initial milestone in GitHub
- Recent Discourse thread about current capabilities and constraints for testing in dbt
In the process, we’re hoping to resolve some inconsistencies that should get us well on our way to v1.0 later this year. Happy 2021