A release candidate of dbt v0.18.0 (Marian Anderson) is now available on PyPi, Homebrew, and dbt Cloud. View the Changelog for the full set of changes implemented since v0.17, and the migration guide for an overview of new features.
This release includes several beta features [β]. These are pieces of net-new functionality that we plan to refine over the course of several versions. We believe they will work as intended in the majority of projects. At the same time, we know that there will be many weirdnesses, rough edges, and ways to engender surprising behavior—and we want to hear all about them.
Give this RC a spin, and let us know what you find, by responding below and posting in the #prereleases channel. Happy testing!
Installation:
# with pip
pip install --upgrade dbt==0.18.0rc1
# with homebrew
brew install dbt@0.18.0-rc1
brew link --overwrite dbt@0.18.0-rc1
N.B. If your project depends on packages (such as dbt-utils
) that require running with a dbt version <0.18.0
, you can use the --no-version-check
flag to test out prerelease functionality.
Highlights
Node selection
(docs)
dbt v0.18.0 introduces several new features for node selection:
- methods:
config
,test_type
,test_name
,package
,state
[β] - intersections
- nth-degree parent/child
- version-controlled YML
selectors
It’s now possible to do things like:
# list all my incremental models
$ dbt ls -m config.materialized:incremental
# run only incremental models defined in the snowplow package
$ dbt run -m config.materialized:incremental,package:snowplow
# ...and their immediate offspring
$ dbt run -m config.materialized:incremental+1,package:snowplow+1
# execute all my warn-severity tests
$ dbt test -m config.severity:warn
For especially complex selection criteria, you can define a YAML selector [β] and save it, in version control, as a selectors.yml
file in your project:
selectors:
- name: snowplow_incrementals_plus_one
definition:
intersection:
- method: config.materialized
value: incremental
children: true
children_depth: 1
- method: package
value: snowplow
children: true
children_depth: 1
And then reference it in dbt commands:
$ dbt ls --selector snowplow_incrementals_plus_one
$ dbt run --selector snowplow_incrementals_plus_one
$ dbt test --selector snowplow_incrementals_plus_one
Slim CI [β]
English: “I only want to run the models that have changed, and their children, without needing to run all their parents first.”
Français: « Je voudrais exécuter seulement les modèles modifiés, et ses enfants, sans avoir besoin d’exécuter leurs parents en premier. »
dbt: dbt run -m state:modified+ --defer --state path/to/prod/artifacts
Translation:
As long as you can provide the path to the artifacts (namely manifest.json
) from a previous prod run, you can:
- Run only models that are new or changed (docs)
- “Defer” resolution of upstream references to the prod namespace (docs)
Defer and state can be switched on via CLI flags or environment variables in your deployment tool of choice. Support for this workflow in dbt Cloud is coming soon.
We plan to continue refining the behavior of state:modified
and --defer
. For now, take note of some limitations:
-
state:modified
looks for discrepancies between manifests relating to contents, database-relevant configs, descriptions (ifpersist_docs
), and database representations. Custom environment-aware logic that leveragestarget
or env vars to set conditional values will also cause discrepancies between manifests. This may result in false positives, i.e. running more models in CI than strictly necessary. -
state:modified
cannot trace the downstream implications of modifications to macros or vars. We hope to add this functionality in future releases.
Dispatched macros
(docs)
Frequently, different databases require minutely different SQL to produce the same result. For example, let’s say we want to find the number of hours between two timestamps:
-- postgres
extract(epoch from timestamp_a - timestamp_b)/3600
-- redshift
datediff(hour, timestamp_a, timestamp_b)
-- bigquery
timestamp_diff(timestamp_b, timestamp_a, hour)
Luckily, there’s a macro for that: [dbt_utils.datediff](https://github.com/fishtown-analytics/dbt-utils/blob/dev/0.6.0/macros/cross_db_utils/datediff.sql)
. The crucial mechanism, underpinning a lot of the cross-database functionality in dbt-utils, is our ability to call one macro and have it resolve differently based on the adapter.
The old way: adapter_macro
{% macro datediff(first_date, second_date, datepart) %}
{{ adapter_macro('dbt_utils.datediff', first_date, second_date, datepart) }}
{% endmacro %}
If we were running on Redshift, dbt would look for dbt_utils.redshift__datediff
. If not found, it would then fall back to the default implementation, dbt_utils.default__datediff
.
Though the syntax here is a bit strange, it has worked well enough, with one big limitation: adapter macros are tightly scoped within the package that defines them. If we wanted to add a new implementation, such as spark__datediff
, it would require opening a PR against the dbt_utils
package.
As the number of community-supported plugins has blossomed—a very exciting development!—we know that squeezing all adapter-specific implementations into one package is not the answer.
The new way: adapter.dispatch
Here’s the same macro, written using dispatch
instead:
{% macro datediff(first_date, second_date, datepart) %}
{{
adapter.dispatch(
macro_name = 'datediff',
packages = var('dbt_utils_dispatch_list', []) + ['dbt_utils']
)
(first_date, second_date, datepart)
}}
{% endmacro %}
The new packages
argument allows us to scope the places dbt should search for valid implementations of the datediff
macro. The inclusion of a var
allows end users, or other package developers, to declare that implementations in their package should take priority.
For example, let’s say there’s a package called spark_utils
that “extends” dbt_utils
by defining spark__datediff
. In my project (project_jerco
), I can install both packages and define my dispatch preference in dbt_project.yml
:
vars:
dbt_utils_dispatch_list:
- project_jerco
- spark_utils
This means that I want to prioritize my own implementations of datediff
, then spark_utils
, before finally falling back to what’s in dbt_utils
. Whenever dbt tries to resolve dbt_utils.datediff
, it will look for macros in the following order, and use the first suitable match:
-
project_jerco.spark__datediff
(not found) -
project_jerco.default__datediff
(not found) -
spark_utils.spark__datediff
(found! stop looking) spark_utils.default__datediff
dbt_utils.spark__datediff
-
dbt_utils.default__datediff
(used if none of the above exist)
This works whether we’re calling dbt_utils.datediff
directly, or if we’re calling a macro that depends on it, such as dbt_utils.datespine
.