Validating the SQL in a Redshift late-binding view

Why Redshift’s late-binding views are great

Redshift’s late-binding views “unbinds” a view from the data it selects from. In practice, this means that if upstream views or tables are dropped with a cascade qualifier, the late-binding view does not get dropped as well.

Using late-binding views in a production deployment of dbt can vastly improve the availability of data in the warehouse, especially for models that are materialized as late-binding views and are queried by end-users, since they won’t be dropped when upstream models are updated.

For more information on late-binding views, see the docs and this blogpost.

Why Redshift’s late-binding views are not so great

One downside of using late-binding views is that if the SQL in your view is invalid, it may only be picked up when a query that returns results from this view is executed – while Redshift will detect syntax errors in your SQL straight away, if you happen to use an incorrect column name or table name in your query, or forget to give a calculated column an alias, it will still create the view for you anyway!

For models in your dbt project that have downstream models materialized as tables, this error will be raised during the execution of the downstream model, making it harder to diagnose where your bad SQL exists. And if your late-binding view is exposed to an end user, you may not realize you have bad SQL until someone queries it!

Validating the SQL in your late-binding view

Fortunately, you can leverage post-hooks and macros so check whether a late-binding views contain valid SQL by selecting a single row from your model. Note that this will slow down your dev runs of dbt, so use it with caution!

With the following macro, if you’re in dev, and your model is a late-binding view, a query that selects a row from the view will be executed.

-- macros/test_late_binding_view_in_dev.sql
{% macro test_late_binding_view_in_dev() -%}

    {%- set is_late_binding_view=
        model['config']['materialized'] == 'view'
        and model['config']['bind'] == False
    -%}

    {%- set is_dev = (target.name == 'dev') -%}

    {%- if is_dev and is_late_binding_view -%}

        select * from {{ this }} limit 1

    {%- else -%}

        select 1 as test

    {%- endif -%}

{%- endmacro %}

By calling this as a post hook for each model, this macro will catch any bad SQL before moving on to the next model, raising an error for the model with the invalid SQL, and skipping downstream models.

models:
  bind: false
  post-hook:
    - "{{ jaffle_shop.test_late_binding_view_in_dev() }}"

Notes:

  • I have decided to only test this when in dev, to reduce my prod run time.
  • I namespaced my macro in the post-hook definition so that this macro works for my installed packages
  • As an alternative solution, I also investigated setting the bind parameter based on the target so views would only be late-binding in prod. I couldn’t get that appraoch to work due to the way dbt parses the dbt_project.yml file.
7 Likes