Prerelease: v0.20.0 (Margaret Mead)

Updates

[Jun 04] v0.20.0-rc1 is available for prerelease testing.

dbt v0.20.0rc1 (Margaret Mead) is now available on PyPi, Homebrew, DockerHub, and dbt Cloud. The two biggest areas of focus are Tests and Performance, which I’ll discuss below. There’s lots more in this release, though, so I’d encourage you to read:

  • Changelog for the full set of features, fixes, and under-the-hood tweaks
  • Migration guide for an overview of new and changed documentation

Installation

# with pip, install a specific adapter
pip install --upgrade dbt-<adapter>==0.20.0rc1

# with Homebrew, install four oldest adapters
brew install dbt@0.20.0rc1
brew link --overwrite dbt@0.20.0rc1

A few notes:

  • If you’re installing from PyPi, we recommend specifying your adapter as dbt-adapter (e.g. dbt-postgres). This way, you install just what you need, and avoid any dependencies you don’t. If you’re installing from Homebrew: We haven’t yet built a separate formula for each adapter, but we plan to in the future.
  • dbt-core==0.20.0rc1 includes a new dependency, tree-sitter. (See the experimental parser section below.) This requires a C compiler, such as GCC, to successfully install. We’re working to remove this requirement ahead of the final release.

Breaking changes

Note that this release includes breaking changes for:

  • Custom generic (schema) tests. All test queries should return a set of rows, rather than a single numeric value. In most cases, this is as simple as switching select count(*) to select *.
  • Users and maintainers of packages that leverage adapter.dispatch(). See docs for full details.
  • Artifacts: manifest.json and run_results.json are now using a v2 schema.

Tests

We’ve written previously about all the exciting directions community members are going with dbt’s testing functionality. I’ve seen frameworks for unit testing, regression testing, you-name-it testing. There’s so much that you can do with dbt tests: they’re just macros; they’re just SQL.

At the same time, tests have been finicky and unintuitive. They’re a critical part of dbt—and we want to go far with them—so, for now, we’re securing their foundations. We’re looking to release dbt v1.0 later this year, and bringing tests up to parity is one of our highest priorities ahead of dbt’s first major-version release.

In v0.20, tests will:

  • Be more consistent between their one-off (“data test”) and generic (“schema test”) implementations, where the latter is just a reusable version of the former
  • Execute via a 'test' materialization, rather than mysterious python code
  • Be configurable from dbt_project.yml, including the ability to set default severity, or disable tests from packages
  • Support a number of new configurations, all out of the box, including:
    • where filters on the underlying model, seed, snapshot, or source being tested
    • warn_if and error_if conditional thresholds, based on the number of failures
  • Store failing records in the database for easy development-time debugging, if that’s something you want :slight_smile:

There are things we didn’t get to, which I want to call out because they’re still great ideas. We may still take a swing at these ahead of releasing dbt v1.0 later this year:

  • Supporting plain-language descriptions for tests. This intersected with performance improvements in a way we couldn’t do both simultaneously. I still want to get to a place where a failing unique test on the id column in the customers table returns a sentence like: Found 5 duplicated values of customers.id, erroring because 5!=0.
  • Better FQNs, to make it easier to configure an individual test from dbt_project.yml—or, say, all tests on a given subfolder of models.
  • Defining generic test blocks inside the tests/ folder, so that generic and one-off tests cohabitate in harmony. For now, they still need to live in macros/.
  • Renaming schema_test and data_test in the codebase. I’ve started calling these generic and bespoke, which feels much more accurate, but those words don’t roll right off the tongue. If you have good ideas, I’d love to hear them!

Performance

We’ve seen that dbt v0.19.1 offers, on average, 3x faster parsing versus v0.19.0. That means projects which used to take 1 minute between typing dbt run and seeing the first model execute are down to 20 seconds. That’s an amazing, hard-won improvement—and it’s still not fast enough. We want projects of all sizes, whether 100 models or 5k models, to start up in fewer than 5 seconds while you’re developing.

To accomplish this, we’ve included two big features in v0.20.0: a top-to-bottom rework of partial parsing, and an experimental parser that can statically analyze the majority of dbt models. Both features are off by default; we encourage you to give them a try, and let us know what you find. For more details, see our fresh new docs on parsing.

:bookmark: Partial parsing rework

Partial parsing is a feature that’s been around for some time—two years, to be precise. If you’ve ever used dbt Cloud’s IDE, you’ve benefitted from partial parsing, even if you didn’t know it at the time.

The premise of partial parsing is simple. In development, you’re probably only editing a handful of files at a time. Rather than reread every file, and rebuild your entire project state from scratch, every time, dbt should re-parse just the files that have changed.

Yet partial parsing has been far from perfect. That’s because there are parts of dbt’s “mise en place” that the old partial parsing just couldn’t help with, such as processing refs and rendering descriptions. Even if no files had changed, partial-parse runs could still take over a minute for some projects.

In dbt v0.20.0, that’s changing. In a project with 5000 files, changing 1 file and re-running with --partial-parse ought to start up in 5 seconds, no more.

Partial parsing still isn’t perfect: We’ve documented a set of known edge cases, where a full re-parse is necessary. We also touched a lot of code to make this possible, and so we’ll need your help testing this extensively. Please, please let us know if you encounter weird bugs or undocumented edge cases.

:sparkles: Experimental parser

dbt leverages a set of special Jinja macros—ref(), source(), and config()—to infer needed information, at parse time, about the properties of a model, its dependencies, and its place in the DAG. Extracting information from those macros has always required a full Jinja render—until today. We’ve coded up a way to statically analyze that information instead.

For now, the experimental parser only works with models, and models whose Jinja is limited to those three special macros. When it works, it really works: the experimental parser is at least 3x faster than a full Jinja render. Based on testing with data from dbt Cloud, we believe the experimental parser can handle 60% of models in the wild, translating to a 40% speedier model parser on average. We think it will yield at least some benefit in 95% of projects.

You can check it out by running dbt parse and dbt --use-experimental-parser parse, and comparing the results in target/perf_info.json.

1 Like

It seems like the syntax for the experimental parser should be

dbt --use-experimental-parser parse

The provided syntax errored for me.

1 Like

You’re totally right! Just fixed above. Thanks for catching :slight_smile: