Prerelease: v0.18.0 (Marian Anderson)

jerco · August 20, 2020, 2:04pm

A release candidate of dbt v0.18.0 (Marian Anderson) is now available on PyPi, Homebrew, and dbt Cloud. View the Changelog for the full set of changes implemented since v0.17, and the migration guide for an overview of new features.

This release includes several beta features [β]. These are pieces of net-new functionality that we plan to refine over the course of several versions. We believe they will work as intended in the majority of projects. At the same time, we know that there will be many weirdnesses, rough edges, and ways to engender surprising behavior—and we want to hear all about them.

Give this RC a spin, and let us know what you find, by responding below and posting in the #prereleases channel. Happy testing!

Installation:

# with pip
pip install --upgrade dbt==0.18.0rc1

# with homebrew
brew install dbt@0.18.0-rc1
brew link --overwrite dbt@0.18.0-rc1

N.B. If your project depends on packages (such as dbt-utils) that require running with a dbt version <0.18.0, you can use the --no-version-check flag to test out prerelease functionality.

Highlights

Node selection

(docs)

dbt v0.18.0 introduces several new features for node selection:

methods: config, test_type, test_name, package, state [β]
intersections
nth-degree parent/child
version-controlled YML selectors

It’s now possible to do things like:

# list all my incremental models
$ dbt ls -m config.materialized:incremental

# run only incremental models defined in the snowplow package
$ dbt run -m config.materialized:incremental,package:snowplow

# ...and their immediate offspring
$ dbt run -m config.materialized:incremental+1,package:snowplow+1

# execute all my warn-severity tests
$ dbt test -m config.severity:warn

For especially complex selection criteria, you can define a YAML selector [β] and save it, in version control, as a selectors.yml file in your project:

selectors:
  - name: snowplow_incrementals_plus_one
    definition:
      intersection:
        - method: config.materialized
          value: incremental
          children: true
          children_depth: 1
        - method: package
          value: snowplow
          children: true
          children_depth: 1

And then reference it in dbt commands:

$ dbt ls --selector snowplow_incrementals_plus_one
$ dbt run --selector snowplow_incrementals_plus_one
$ dbt test --selector snowplow_incrementals_plus_one

Slim CI [β]

English: “I only want to run the models that have changed, and their children, without needing to run all their parents first.”

Français: « Je voudrais exécuter seulement les modèles modifiés, et ses enfants, sans avoir besoin d’exécuter leurs parents en premier. »

dbt: dbt run -m state:modified+ --defer --state path/to/prod/artifacts

Translation:

As long as you can provide the path to the artifacts (namely manifest.json) from a previous prod run, you can:

Run only models that are new or changed (docs)
“Defer” resolution of upstream references to the prod namespace (docs)

Defer and state can be switched on via CLI flags or environment variables in your deployment tool of choice. Support for this workflow in dbt Cloud is coming soon.

We plan to continue refining the behavior of state:modified and --defer. For now, take note of some limitations:

state:modified looks for discrepancies between manifests relating to contents, database-relevant configs, descriptions (if persist_docs), and database representations. Custom environment-aware logic that leverages target or env vars to set conditional values will also cause discrepancies between manifests. This may result in false positives, i.e. running more models in CI than strictly necessary.
state:modified cannot trace the downstream implications of modifications to macros or vars. We hope to add this functionality in future releases.

Dispatched macros

(docs)

Frequently, different databases require minutely different SQL to produce the same result. For example, let’s say we want to find the number of hours between two timestamps:

-- postgres
extract(epoch from timestamp_a - timestamp_b)/3600

-- redshift
datediff(hour, timestamp_a, timestamp_b)

-- bigquery
timestamp_diff(timestamp_b, timestamp_a, hour)

Luckily, there’s a macro for that: [dbt_utils.datediff](https://github.com/fishtown-analytics/dbt-utils/blob/dev/0.6.0/macros/cross_db_utils/datediff.sql). The crucial mechanism, underpinning a lot of the cross-database functionality in dbt-utils, is our ability to call one macro and have it resolve differently based on the adapter.

The old way: adapter_macro

{% macro datediff(first_date, second_date, datepart) %}
  {{ adapter_macro('dbt_utils.datediff', first_date, second_date, datepart) }}
{% endmacro %}

If we were running on Redshift, dbt would look for dbt_utils.redshift__datediff. If not found, it would then fall back to the default implementation, dbt_utils.default__datediff.

Though the syntax here is a bit strange, it has worked well enough, with one big limitation: adapter macros are tightly scoped within the package that defines them. If we wanted to add a new implementation, such as spark__datediff, it would require opening a PR against the dbt_utils package.

As the number of community-supported plugins has blossomed—a very exciting development!—we know that squeezing all adapter-specific implementations into one package is not the answer.

The new way: adapter.dispatch

Here’s the same macro, written using dispatch instead:

{% macro datediff(first_date, second_date, datepart) %}
  {{

		adapter.dispatch(
			macro_name = 'datediff',
			packages = var('dbt_utils_dispatch_list', []) + ['dbt_utils']
		)
		
		(first_date, second_date, datepart)

	}}
{% endmacro %}

The new packages argument allows us to scope the places dbt should search for valid implementations of the datediff macro. The inclusion of a var allows end users, or other package developers, to declare that implementations in their package should take priority.

For example, let’s say there’s a package called spark_utils that “extends” dbt_utils by defining spark__datediff. In my project (project_jerco), I can install both packages and define my dispatch preference in dbt_project.yml:

vars:
  dbt_utils_dispatch_list:
    - project_jerco
    - spark_utils

This means that I want to prioritize my own implementations of datediff, then spark_utils, before finally falling back to what’s in dbt_utils. Whenever dbt tries to resolve dbt_utils.datediff, it will look for macros in the following order, and use the first suitable match:

project_jerco.spark__datediff (not found)
project_jerco.default__datediff (not found)
spark_utils.spark__datediff (found! stop looking)
spark_utils.default__datediff
dbt_utils.spark__datediff
dbt_utils.default__datediff (used if none of the above exist)

This works whether we’re calling dbt_utils.datediff directly, or if we’re calling a macro that depends on it, such as dbt_utils.datespine.

meurant.naude · August 27, 2020, 4:11am

Very nice functionality. I like the extra dbt run/compile options.
Good stuff!

ashar3 · September 14, 2020, 2:29pm

Awesome feature list do we have target release date for this one ??

jerco · September 14, 2020, 2:59pm

We released v0.18.0 on September 3! Check out the post:

Topic		Replies	Views
Release: v0.18.0 (Marian Anderson) Archive	2	3693	September 8, 2020
Release: v0.19.0 (Kiyoshi Kuromiya) Archive	1	4851	January 28, 2021
Pre-release: v0.16.0 (Barbara Gittings) Archive	13	5954	March 18, 2020
Release: dbt Core v1.0 (W. E. B. Du Bois) Archive	2	8382	November 2, 2021
Release: dbt v0.13.0 Archive	0	2991	March 22, 2019

Prerelease: v0.18.0 (Marian Anderson)

Highlights

Node selection

Slim CI [β]

Dispatched macros

Related topics