[Jun 04] v0.20.0-rc1 is available for prerelease testing.
dbt v0.20.0rc1 (Margaret Mead) is now available on PyPi, Homebrew, DockerHub, and dbt Cloud. The two biggest areas of focus are Tests and Performance, which I’ll discuss below. There’s lots more in this release, though, so I’d encourage you to read:
- Changelog for the full set of features, fixes, and under-the-hood tweaks
- Migration guide for an overview of new and changed documentation
# with pip, install a specific adapter pip install --upgrade dbt-<adapter>==0.20.0rc1 # with Homebrew, install four oldest adapters brew install email@example.com brew link --overwrite firstname.lastname@example.org
A few notes:
- If you’re installing from PyPi, we recommend specifying your adapter as
dbt-postgres). This way, you install just what you need, and avoid any dependencies you don’t. If you’re installing from Homebrew: We haven’t yet built a separate formula for each adapter, but we plan to in the future.
dbt-core==0.20.0rc1includes a new dependency,
tree-sitter. (See the experimental parser section below.) This requires a C compiler, such as GCC, to successfully install. We’re working to remove this requirement ahead of the final release.
Note that this release includes breaking changes for:
- Custom generic (schema) tests. All test queries should return a set of rows, rather than a single numeric value. In most cases, this is as simple as switching
- Users and maintainers of packages that leverage
adapter.dispatch(). See docs for full details.
run_results.jsonare now using a v2 schema.
We’ve written previously about all the exciting directions community members are going with dbt’s testing functionality. I’ve seen frameworks for unit testing, regression testing, you-name-it testing. There’s so much that you can do with dbt tests: they’re just macros; they’re just SQL.
At the same time, tests have been finicky and unintuitive. They’re a critical part of dbt—and we want to go far with them—so, for now, we’re securing their foundations. We’re looking to release dbt v1.0 later this year, and bringing tests up to parity is one of our highest priorities ahead of dbt’s first major-version release.
In v0.20, tests will:
- Be more consistent between their one-off (“data test”) and generic (“schema test”) implementations, where the latter is just a reusable version of the former
- Execute via a
'test'materialization, rather than mysterious python code
- Be configurable from
dbt_project.yml, including the ability to set default severity, or disable tests from packages
- Support a number of new configurations, all out of the box, including:
wherefilters on the underlying model, seed, snapshot, or source being tested
error_ifconditional thresholds, based on the number of failures
- Store failing records in the database for easy development-time debugging, if that’s something you want
There are things we didn’t get to, which I want to call out because they’re still great ideas. We may still take a swing at these ahead of releasing dbt v1.0 later this year:
- Supporting plain-language descriptions for tests. This intersected with performance improvements in a way we couldn’t do both simultaneously. I still want to get to a place where a failing
uniquetest on the
idcolumn in the
customerstable returns a sentence like:
Found 5 duplicated values of customers.id, erroring because 5!=0.
- Better FQNs, to make it easier to configure an individual test from
dbt_project.yml—or, say, all tests on a given subfolder of models.
- Defining generic
testblocks inside the
tests/folder, so that generic and one-off tests cohabitate in harmony. For now, they still need to live in
data_testin the codebase. I’ve started calling these
bespoke, which feels much more accurate, but those words don’t roll right off the tongue. If you have good ideas, I’d love to hear them!
We’ve seen that dbt v0.19.1 offers, on average, 3x faster parsing versus v0.19.0. That means projects which used to take 1 minute between typing
dbt run and seeing the first model execute are down to 20 seconds. That’s an amazing, hard-won improvement—and it’s still not fast enough. We want projects of all sizes, whether 100 models or 5k models, to start up in fewer than 5 seconds while you’re developing.
To accomplish this, we’ve included two big features in v0.20.0: a top-to-bottom rework of partial parsing, and an experimental parser that can statically analyze the majority of dbt models. Both features are off by default; we encourage you to give them a try, and let us know what you find. For more details, see our fresh new docs on parsing.
Partial parsing is a feature that’s been around for some time—two years, to be precise. If you’ve ever used dbt Cloud’s IDE, you’ve benefitted from partial parsing, even if you didn’t know it at the time.
The premise of partial parsing is simple. In development, you’re probably only editing a handful of files at a time. Rather than reread every file, and rebuild your entire project state from scratch, every time, dbt should re-parse just the files that have changed.
Yet partial parsing has been far from perfect. That’s because there are parts of dbt’s “mise en place” that the old partial parsing just couldn’t help with, such as processing refs and rendering descriptions. Even if no files had changed, partial-parse runs could still take over a minute for some projects.
In dbt v0.20.0, that’s changing. In a project with 5000 files, changing 1 file and re-running with
--partial-parse ought to start up in 5 seconds, no more.
Partial parsing still isn’t perfect: We’ve documented a set of known edge cases, where a full re-parse is necessary. We also touched a lot of code to make this possible, and so we’ll need your help testing this extensively. Please, please let us know if you encounter weird bugs or undocumented edge cases.
dbt leverages a set of special Jinja macros—
config()—to infer needed information, at parse time, about the properties of a model, its dependencies, and its place in the DAG. Extracting information from those macros has always required a full Jinja render—until today. We’ve coded up a way to statically analyze that information instead.
For now, the experimental parser only works with models, and models whose Jinja is limited to those three special macros. When it works, it really works: the experimental parser is at least 3x faster than a full Jinja render. Based on testing with data from dbt Cloud, we believe the experimental parser can handle 60% of models in the wild, translating to a 40% speedier model parser on average. We think it will yield at least some benefit in 95% of projects.
You can check it out by running
dbt parse and
dbt --use-experimental-parser parse, and comparing the results in