Faster dbt startup in v0.19.1 (beta)

Updates:

  • [Apr 5] v0.19.1 (final) is available on PyPi, Homebrew, DockerHub, and dbt Cloud.
  • [Mar 22] v0.19.1rc1 is available on PyPi, Homebrew, DockerHub, and dbt Cloud.
  • [Feb 15] v0.19.1b2 is available on PyPi.

I’m excited to announce that we have a beta version of a patch & performance release ready to install from PyPi:

pip install dbt==0.19.1b2

This release has all the same functionality as in v0.19.0, plus:

  • Fixes for some dbt + BigQuery regressions related to partition granularity and incremental models
  • Significant speedup in invocation startup time. In local test projects, we find this beta release is 2-3x faster than dbt v0.19.0 at loading, parsing, and readying projects of all sizes. I’ll say more on this below.

We touched a lot of low-lying code, so the more beta testers we can have across all adapters and project sizes, the better. There should not be any breaking changes relative to v0.19.0. If you believe you’ve found a bug, please send us a message in the #dbt-prereleases channel of dbt Slack.

Performance

Today, when you type dbt run, dbt needs to:

  1. Load and parse the files in your project (.sql, .yml, .md, …)
  2. Validate those files, to raise an error early on if (e.g.) some YAML is improperly formatted
  3. Capture ref(), source(), and config() calls to build the DAG, and to power node selection
  4. Construct a “parsed manifest”: a usable, reliable representation of all the code in your project
  5. Run metadata queries against your adapter to build a relational cache, with information about all the tables and views that already exist
  6. Pick the models specified by --models criteria, determine run order, and get off to the races

I’m glossing over some of the details, but it’s important to say that—as dbt works today—it needs to do all of the above, in every invocation, no matter if you’re running one model or a thousand. You can use partial parsing (docs) to avoid re-parsing unchanged files in subsequent runs—and you really should!—but there are some limitations around --vars and env vars that may throw off potential benefits, depending on how you use those features.

Ultimately, we want all projects to start up in a matter of seconds, not minutes. To that end, dbt needs to be able to parse lots of files much more quickly than it does in v0.19.0. This beta release is a first step on the way there.

To help us measure improvements, v0.19.0 introduced a new command: dbt parse. This command tells dbt to perform steps 1-4 listed above, and to write the results to a file (target/perf_info.json by default), which includes detailed timing information broken down by load step, package, and resource type. Try it out! Switch between v0.19.0 and v0.19.1b2, toggle on --partial-parse, compare the results. If you find some really noticeable differences, let us know in Slack :slight_smile:

Installation notes

  • We’re still working out some distribution details, so this beta version is just available on PyPi for now. As always, we plan to support installation from Homebrew and DockerHub for the final release.
  • When installing, ensure you have the latest versions of pip and setuptools. pip v20.3 made significant changes to its dependency resolver, and using older versions may result in a failed installation. To upgrade: pip install --upgrade pip setuptools
  • You must use python 3.6, 3.7, or 3.8 when installing dbt or dbt-snowflake, due to pinned versions of some Snowflake-specific dependencies. If you are not a Snowflake user, you should be able to install your plugin in a python 3.9 environment by specifying your adapter: pip install dbt-postgres==0.19.1b2. If you use a community-supported adapter, py39 compatibility will depend on the code in that adapter plugin.
1 Like
  1. faster CLI
  2. learned something new about dbt via an well-written intro to a new feature

What’s not to like?!?

1 Like