Prerelease: v0.18.0 (Marian Anderson)

A release candidate of dbt v0.18.0 (Marian Anderson) is now available on PyPi, Homebrew, and dbt Cloud. View the Changelog for the full set of changes implemented since v0.17, and the migration guide for an overview of new features.

This release includes several beta features [β]. These are pieces of net-new functionality that we plan to refine over the course of several versions. We believe they will work as intended in the majority of projects. At the same time, we know that there will be many weirdnesses, rough edges, and ways to engender surprising behavior—and we want to hear all about them.

Give this RC a spin, and let us know what you find, by responding below and posting in the #prereleases channel. Happy testing!

Installation:

# with pip
pip install --upgrade dbt==0.18.0rc1

# with homebrew
brew install dbt@0.18.0-rc1
brew link --overwrite dbt@0.18.0-rc1

N.B. If your project depends on packages (such as dbt-utils) that require running with a dbt version <0.18.0, you can use the --no-version-check flag to test out prerelease functionality.

Highlights

Node selection

(docs)

dbt v0.18.0 introduces several new features for node selection:

  • methods: config, test_type, test_name, package, state [β]
  • intersections
  • nth-degree parent/child
  • version-controlled YML selectors

It’s now possible to do things like:

# list all my incremental models
$ dbt ls -m config.materialized:incremental

# run only incremental models defined in the snowplow package
$ dbt run -m config.materialized:incremental,package:snowplow

# ...and their immediate offspring
$ dbt run -m config.materialized:incremental+1,package:snowplow+1

# execute all my warn-severity tests
$ dbt test -m config.severity:warn

For especially complex selection criteria, you can define a YAML selector [β] and save it, in version control, as a selectors.yml file in your project:

selectors:
  - name: snowplow_incrementals_plus_one
    definition:
      intersection:
        - method: config.materialized
          value: incremental
          children: true
          children_depth: 1
        - method: package
          value: snowplow
          children: true
          children_depth: 1

And then reference it in dbt commands:

$ dbt ls --selector snowplow_incrementals_plus_one
$ dbt run --selector snowplow_incrementals_plus_one
$ dbt test --selector snowplow_incrementals_plus_one

Slim CI [β]

English: “I only want to run the models that have changed, and their children, without needing to run all their parents first.”

Français: « Je voudrais exécuter seulement les modèles modifiés, et ses enfants, sans avoir besoin d’exécuter leurs parents en premier. »

dbt: dbt run -m state:modified+ --defer --state path/to/prod/artifacts

Translation:

As long as you can provide the path to the artifacts (namely manifest.json) from a previous prod run, you can:

  • Run only models that are new or changed (docs)
  • “Defer” resolution of upstream references to the prod namespace (docs)

Defer and state can be switched on via CLI flags or environment variables in your deployment tool of choice. Support for this workflow in dbt Cloud is coming soon.

We plan to continue refining the behavior of state:modified and --defer. For now, take note of some limitations:

  • state:modified looks for discrepancies between manifests relating to contents, database-relevant configs, descriptions (if persist_docs), and database representations. Custom environment-aware logic that leverages target or env vars to set conditional values will also cause discrepancies between manifests. This may result in false positives, i.e. running more models in CI than strictly necessary.
  • state:modified cannot trace the downstream implications of modifications to macros or vars. We hope to add this functionality in future releases.

Dispatched macros

(docs)

Frequently, different databases require minutely different SQL to produce the same result. For example, let’s say we want to find the number of hours between two timestamps:

-- postgres
extract(epoch from timestamp_a - timestamp_b)/3600

-- redshift
datediff(hour, timestamp_a, timestamp_b)

-- bigquery
timestamp_diff(timestamp_b, timestamp_a, hour)

Luckily, there’s a macro for that: [dbt_utils.datediff](https://github.com/fishtown-analytics/dbt-utils/blob/dev/0.6.0/macros/cross_db_utils/datediff.sql). The crucial mechanism, underpinning a lot of the cross-database functionality in dbt-utils, is our ability to call one macro and have it resolve differently based on the adapter.

The old way: adapter_macro

{% macro datediff(first_date, second_date, datepart) %}
  {{ adapter_macro('dbt_utils.datediff', first_date, second_date, datepart) }}
{% endmacro %}

If we were running on Redshift, dbt would look for dbt_utils.redshift__datediff. If not found, it would then fall back to the default implementation, dbt_utils.default__datediff.

Though the syntax here is a bit strange, it has worked well enough, with one big limitation: adapter macros are tightly scoped within the package that defines them. If we wanted to add a new implementation, such as spark__datediff, it would require opening a PR against the dbt_utils package.

As the number of community-supported plugins has blossomed—a very exciting development!—we know that squeezing all adapter-specific implementations into one package is not the answer.

The new way: adapter.dispatch

Here’s the same macro, written using dispatch instead:

{% macro datediff(first_date, second_date, datepart) %}
  {{

		adapter.dispatch(
			macro_name = 'datediff',
			packages = var('dbt_utils_dispatch_list', []) + ['dbt_utils']
		)
		
		(first_date, second_date, datepart)

	}}
{% endmacro %}

The new packages argument allows us to scope the places dbt should search for valid implementations of the datediff macro. The inclusion of a var allows end users, or other package developers, to declare that implementations in their package should take priority.

For example, let’s say there’s a package called spark_utils that “extends” dbt_utils by defining spark__datediff. In my project (project_jerco), I can install both packages and define my dispatch preference in dbt_project.yml:

vars:
  dbt_utils_dispatch_list:
    - project_jerco
    - spark_utils

This means that I want to prioritize my own implementations of datediff, then spark_utils, before finally falling back to what’s in dbt_utils. Whenever dbt tries to resolve dbt_utils.datediff, it will look for macros in the following order, and use the first suitable match:

  1. project_jerco.spark__datediff (not found)
  2. project_jerco.default__datediff (not found)
  3. spark_utils.spark__datediff (found! stop looking)
  4. spark_utils.default__datediff
  5. dbt_utils.spark__datediff
  6. dbt_utils.default__datediff (used if none of the above exist)

This works whether we’re calling dbt_utils.datediff directly, or if we’re calling a macro that depends on it, such as dbt_utils.datespine.

8 Likes

Very nice functionality. I like the extra dbt run/compile options.
Good stuff!

Awesome feature list do we have target release date for this one ??

We released v0.18.0 on September 3! Check out the post: