Faster dbt startup in v0.19.1 (beta)

jerco · February 15, 2021, 7:20pm

Updates:

[Apr 5] v0.19.1 (final) is available on PyPi, Homebrew, DockerHub, and dbt Cloud.
[Mar 22] v0.19.1rc1 is available on PyPi, Homebrew, DockerHub, and dbt Cloud.
[Feb 15] v0.19.1b2 is available on PyPi.

I’m excited to announce that we have a beta version of a patch & performance release ready to install from PyPi:

pip install dbt==0.19.1b2

This release has all the same functionality as in v0.19.0, plus:

Fixes for some dbt + BigQuery regressions related to partition granularity and incremental models
Significant speedup in invocation startup time. In local test projects, we find this beta release is 2-3x faster than dbt v0.19.0 at loading, parsing, and readying projects of all sizes. I’ll say more on this below.

We touched a lot of low-lying code, so the more beta testers we can have across all adapters and project sizes, the better. There should not be any breaking changes relative to v0.19.0. If you believe you’ve found a bug, please send us a message in the #dbt-prereleases channel of dbt Slack.

Performance

Today, when you type dbt run, dbt needs to:

Load and parse the files in your project (.sql, .yml, .md, …)
Validate those files, to raise an error early on if (e.g.) some YAML is improperly formatted
Capture ref(), source(), and config() calls to build the DAG, and to power node selection
Construct a “parsed manifest”: a usable, reliable representation of all the code in your project
Run metadata queries against your adapter to build a relational cache, with information about all the tables and views that already exist
Pick the models specified by --models criteria, determine run order, and get off to the races

I’m glossing over some of the details, but it’s important to say that—as dbt works today—it needs to do all of the above, in every invocation, no matter if you’re running one model or a thousand. You can use partial parsing (docs) to avoid re-parsing unchanged files in subsequent runs—and you really should!—but there are some limitations around --vars and env vars that may throw off potential benefits, depending on how you use those features.

Ultimately, we want all projects to start up in a matter of seconds, not minutes. To that end, dbt needs to be able to parse lots of files much more quickly than it does in v0.19.0. This beta release is a first step on the way there.

To help us measure improvements, v0.19.0 introduced a new command: dbt parse. This command tells dbt to perform steps 1-4 listed above, and to write the results to a file (target/perf_info.json by default), which includes detailed timing information broken down by load step, package, and resource type. Try it out! Switch between v0.19.0 and v0.19.1b2, toggle on --partial-parse, compare the results. If you find some really noticeable differences, let us know in Slack

Installation notes

We’re still working out some distribution details, so this beta version is just available on PyPi for now. As always, we plan to support installation from Homebrew and DockerHub for the final release.
When installing, ensure you have the latest versions of pip and setuptools. pip v20.3 made significant changes to its dependency resolver, and using older versions may result in a failed installation. To upgrade: pip install --upgrade pip setuptools
You must use python 3.6, 3.7, or 3.8 when installing dbt or dbt-snowflake, due to pinned versions of some Snowflake-specific dependencies. If you are not a Snowflake user, you should be able to install your plugin in a python 3.9 environment by specifying your adapter: pip install dbt-postgres==0.19.1b2. If you use a community-supported adapter, py39 compatibility will depend on the code in that adapter plugin.

data_ders · February 15, 2021, 7:52pm

faster CLI
learned something new about dbt via an well-written intro to a new feature

What’s not to like?!?

Topic		Replies	Views
Pre-release: v0.16.0 (Barbara Gittings) Archive	13	5886	March 18, 2020
Prerelease: v0.18.0 (Marian Anderson) Archive	3	4991	September 14, 2020
Release: dbt Core v0.21 (Louis Kahn) Archive	0	8517	September 20, 2021
Release: v0.20.0 (Margaret Mead) Archive	5	6589	June 29, 2021
Release: dbt v0.12.2 Archive	2	2765	January 8, 2019

Faster dbt startup in v0.19.1 (beta)

Performance

Installation notes

Related topics