Release: dbt Core v1.0 (W. E. B. Du Bois)

Updates

  • Dec 3: dbt-core v1.0.0 + compatible plugins versions are available.
  • Nov 30: A third and final release candidate, v1.0.0-rc3, is available for dbt-core.
  • Nov 22: A second release candidate, v1.0.0-rc2, is available for dbt-core + some plugins
  • Nov 10: A first release candidate, v1.0.0-rc1, is available
  • Oct 25: A second beta, v1.0.0-b2, is available
  • Oct 11: A first beta, v1.0.0-b1 is available on GitHub + PyPi

:bell: Who is W. E. B. Du Bois ? Check out the release notes for a biography of this famous Philadelphian :bell:

Back in late September, I promised that dbt Core v1.0 was on the horizon. As of December 3, it’s out in the world, ready for primetime.

The post below is a combination of fine print and feature preview, updated many times over the two months between first beta (Oct 13) and final release. I also encourage you to:

  • Try out the beta / release candidate! Upgrade to v1.0!
  • Read the migration guide
  • Join #dbt-v1-readiness in dbt Slack
  • Grab your spot at Coalesce, when we’ll be cutting the v1.0 ribbon :slight_smile:

Upgrading

If you’re using dbt Cloud, you can select 1.0 (latest) from the version dropdown in your development and deployment environments.

If you’re installing dbt Core on your CLI, things will look a little different. We took v1 as an opportunity to rework our packages + release processes (more on that below). To install your specific adapter, including dbt-core and all dependencies:

# with pip
pip install dbt-<adapter> --upgrade

# with brew
brew install dbt-<adapter>
brew link dbt-<adapter> --overwrite

Note: Starting with v1.0.0, pip install dbt will raise an error and return a descriptive message. We’ve taken the adapter split-apart seriously, and we want you to have an easier time installing just the adapter plugin(s) you need. If you want the previous behavior of pip install dbt, you can achieve it with:

pip install dbt-core dbt-postgres dbt-redshift dbt-snowflake dbt-bigquery

Renaming

We’ve just renamed a repository, from dbt to dbt-core, and we’ve updated the logo in its README. Why?

Five years ago, the name dbt referred to a pretty particular thing: a handy command-line tool that made it much easier to create views in Postgres, by storing their definitions in version control and promising to run them in the right order.

Today, dbt refers to a lot more things: a community of practice, a commercial software product, a fast-growing company, a burgeoning package ecosystem, a way of writing SQL, a way of viewing and thinking about analytics problems.

To that end, we’re going to start saying “dbt Core” when we mean dbt-core—that is, the foundational open source software at the heart of it all. The goal here is clarity, and also pride. This is a huge milestone for dbt Core, and we’re going to feel its ripple effects all over the place, across the wide smorgasbord that is dbt in 2021.

Plugins

Our open source plugins for Redshift, Snowflake, and BigQuery now live in their own repositories: dbt-redshift, dbt-snowflake, and dbt-bigquery. Those are the places you should go to report bugs, suggest features, and contribute code that’s specific to each. In fact, we believe this change will make contributing easier than ever! If you depend on or care deeply about one or more, I welcome you to star and watch those repositories.

The dbt-postgres adapter plugin will continue to cohabitate with dbt-core, in the dbt-core repo. There’s a practical reason for this: We use Postgres pretty extensively for core testing and local development workflows today. Eventually, we plan to move it into a separate repo, too. For the time being, it will remain a bit of an exception—though not, I hope, a big source of confusion.

After v1.0, dbt-core will not make breaking changes to adapter interfaces in patch releases. As such, Labs-supported adapter plugins will start declaring compatibility dependencies (~=) on minor versions of dbt-core, and we invite all other database adapters to do the same. This makes it much easier to release and use new patch versions, as soon as we have fixes ready. We’ll still coordinate around new features and interface changes (if any) for all new minor versions.

The code for the dbt RPC server also lives in its own repository: dbt-rpc. The RPC server started as an experiment in possibilities of interactive dbt development, and it’s proven the value of that proposition, serving as the beating heart behind the dbt Cloud IDE. At the same time, we’re convinced that we need to build a more robust, scalable dbt Server. Stay tuned for more details. In the meantime, we’re going to keep maintaining dbt-rpc functionality, but we won’t be including it in dbt Core v1.

Housekeeping

While we were in the renaming spirit, we also updated the dbt-core default branch from develop to main. There was no really good reason to go against established convention here, just old habits. If you have a local clone of the repo, you’ll just need to make a quick update before your next contribution :wink:

Last but not least: We’re adding a stale bot to the dbt-core repo. It will automatically tag any issues that have had no updates for 180 days, and close them (if still no updates) one week later. Our intention here is not to ignore any issue as soon as it’s old, but rather to make the repository a more manageable and accessible place for everyone Many of the most compelling ideas have been around for a while, and we reserve the right to re-open them at any time.

Notable changes (so far!)

v1 is more than just a reorg—it’s a new version of dbt Core, gosh darn it! There are a handful of features already in b1, with more to come over the next several weeks.

Performance

In v0.20, back in July, we introduced a top-to-bottom rework of partial parsing, and a brand-new static parser for many models. In v1.0, we’re turning on both, for everyone, by default. Update in rc1: Partial parsing will detect changes in env vars now, too. Use ‘em to your hearts’ content.

When all is said and done, compared to v0.19.0 (released in January), we believe dbt Core v1.0.0 will offer a 100x faster development experience in very large projects—that is, a 100x speed-up when reading files, identifying changes, updating an internal manifest, and kicking off queries.

We hope v1.0 feels speedy right out of the gate. Thank you for all of your help, patience, and feedback this year as we made performance a top priority.

Global configs

Previously, some runtime configs could be set via flags, some via env vars, and even some in profiles.yml. What gives?

All global configs can now be set in one of three ways: the config block in profiles.yml, an environment variable named DBT_<GLOBAL_CONFIG>, and a CLI flag named --<global-config>. That’s the precedence order, too: CLI flag overrides env var overrides user config.

Even more renaming

Tests have been renamed, once and for all:

  • schema tests are now generic tests
  • data tests are now bespoke singular tests

That’s really it. Tests are more alike than they are different; ultimately, the two test types are just two points of entry into the same functionality. It’s all up to you and your use case.

We also renamed a handful of behind-the-scenes configs in dbt_project.yml, many of which are long-overdue:

  • source-paths is now model-paths. It’s the place you create models.
  • data-paths (default data/) is now seed-paths (default seeds/). It’s the place you create seeds.
  • modules-path (default dbt_modules) is now packages-install-path (default dbt_packages). It’s the place you install packages.

These aren’t breaking changes—we’ve got backwards compatibility for the old names—you’ll just see a deprecation warning or two right after upgrading. This is a one-minute switch, set, & forget. Most important: all new users, starting with v1.0+, will never need to know the difference.


New in b2

New interactive init

This isn’t the init you remember. Now responsive and reusable, whether this is your first time using dbt Core, or you’re onboarding a new colleague (or new computer) to a project that’s been around for years.

$ cd existing-project
$ dbt init
Running with dbt=1.0.0-b2
Setting up your profile.
user (yourname@jaffleshop.com): summerintern@jaffleshop.com
schema (usually dbt_<yourname>): dbt_summerintern
threads (your favorite number, 1-10) [8]: 6
Profile internal-snowflake written to /Users/intern/.dbt/profiles.yml using project's profile_template.yml and your supplied values. Run 'dbt debug' to validate the connection.

Result-based selection

Are you already using the state:modified selection method? We think you should. State-based selection makes it fast and easy to test changes to dbt projects in CI. All you need is an artifact from a past production run.

Artifact-powered selection doesn’t stop there. Now, it’s easier than ever to rerun resources that failed, errored, or were skipped. Just add the --state flag (or DBT_ARTIFACT_STATE_PATH env var).

$ dbt run   --select   result:error  # run all models that generated errors on the prior invocation of dbt run
$ dbt test  --select   result:fail   # run all tests that failed on the prior invocation of dbt test
$ dbt build --select 1+result:fail   # run all the models associated with failed tests from the prior invocation of dbt build
$ dbt seed  --select   result:error  # run all seeds that generated errors on the prior invocation of dbt seed.

If you’re excited by what’s possible with metadata-powered selection, I’ve got an issue for you: dbt-core#4050


New in rc1

Structured events + logging

If rc1 looks a little different, it’s because we managed to complete a top-to-bottom rework of all logging in dbt-core. Logs are now events, flowing through a centralized framework, with much better guarantees about what they will and won’t contain. This is an important step toward a future in which dbt Core can offer compelling, reliable, and real-time integrations with external tools.

Over the next few weeks, we’ll be making some behind-the-scenes changes and cosmetic touch-ups, to solidify structured logging ahead of the final v1.0.0 release. You can read more about our plans in dbt-core#4260.

(If you maintain an adapter plugin, there’s a very quick migration to add compatibility for v1 logging. See the details.)

Metrics

You’ve seen the issue. This is our very first cut at teaching dbt Core about metrics:

metrics:
  - name: new_dbt_projects
    label: dbt New Projects
    model: ref('dim_dbt_projects')
    description: "New dbt projects!"
    type: count
    sql: project_id
    timestamp: first_run_at
    time_grains: [day, week, month]

We’ll have more to say, and more to share, over the coming weeks and months.

Plus…

  • We made a change in v0.20; you were confused; we heard you loud & clear. Test selection is eager again. That means dbt test -s my_model will include relationships tests by default. (But you can always switch to the more “cautious” behavior, if and when you want: dbt test --indirect-selection=cautious.)
  • We’ve reorganized the macros in global_project. Need to find something to override or reimplement? We hope it’s easier than ever

New in rc2

  • Bug fixes!
  • More improvements to structured logging: data in JSON-formatted logs; unique code for each event type; overall cleanup
  • Tiny breaking changes:
    • Based on your feedback, secret env vars (prefixed DBT_ENV_SECRET_) are now available only in profiles.yml + packages.yml
    • config.get() now works the way you’ve always thought it should — the second argument defines the default value, if a config is not otherwise set

New in rc3 + final


All of the above, cleaned up and de-bugged, ready to use in production!

7 Likes

What’s the [8] here?

Default value! If I press “enter” without entering a number, dbt will use 8 threads. The prompts, hints, and defaults are all defined in a file named profile_template.yml (in the adapter plugin and/or the existing project). Docs coming very soon :slight_smile:

2 Likes