Does dbt add primary/foreign key constraints?

oferk83 · April 26, 2019, 3:15pm

Hi there,

I’m a newbie to this tool. With what i’ve read and tested so-far I was under the false impression of what the ‘relationships’ method of the testing feature does.

I’ve set relationships between tables and when I run the test command those tests are successful (i.e PASS). However, for those tables in the data warehouse i’m developing as of now, there are no PK/FK constraints - I have defined that in postgres myself or figured out where to define that within the models schema.

What does the dbt-tool actually check for in the case of ‘relationships’?
Where would I define the PK/FK constraints within the model? Have not seen documentation about this, possibly missed that part.

Short snippet of the schema.yml:
version: 2

models:

name: customers
description: A table containing all customers
columns:
- name: uid
description: This is a unique identifier in the customer
tests:
# - unique
- not_null
- name: suppliers
description: This is a unique identifier in the supplier table
tests:
- not_null
- relationships:
to: ref(‘customers’)
field: uid

Thanks in advance!

drew · April 26, 2019, 3:35pm

Hey @oferk83, welcome!

dbt is typically used with analytical data warehouses like Redshift, Snowflake, and BigQuery. In these databases, column constraints are usually either unsupported or unenforced by the database. For this reason, dbt has the ability to assert data constraints without actually materializing them in the database.

When you run dbt test, dbt will generate SQL queries that assert the validity of the tests that you’ve defined for your models. You can check out the definition of the relationships schema test in the dbt repo. The unique, not_null, and other tests are all defined similarly.

With this paradigm, you can also test other attributes of your models that can’t be expressed as column constraints in your database. Check out some examples of those in the dbt-utils repo.

josh · April 26, 2019, 4:27pm

I’m not sure about BigQuery, but for both Snowflake and Redshift if you do define a constraint not only will it not be enforced by the database, but queries may return incorrect results if the underlying data violates the logical rule defined by the constraint.

So it can be risky defining constraints in the database if there is ever a possibility of the data being incorrect. Yes, you would catch it after the fact if a test was defined in dbt, but a business user might have run a query in the interim before you are able to remedy the data problem and gotten incorrect results.

So we have tended to shy away from defining constraints in Snowflake and Redshift when working with them. The risks outweighed the benefits for now, although if things somehow were to change we would revisit the question.

drew · April 26, 2019, 4:48pm

Thanks @josh - really good points!

We don’t have any plans to support adding column constraints natively in dbt, but some folks have made this work using post-hooks to add constraints. This might look like:

{{
  config(
    post_hook='create index if not exists "{{ this.name }}__index_on_COL_NAME" on {{ this }} ("COL_NAME")'
  )
}}

We have a fledgling postgres package that implements these macros, which might be helpful here: https://github.com/fishtown-analytics/postgres

I haven’t used these macros before, but might be a good starting point!

thalesmello · July 11, 2019, 7:11am

I didn’t see anyone mention it here, but the main case for primary/foreign key constraints in data warehouses is data discoverability.

Two main use cases on the top of my mind:

Navigating related data and auto-completing SQL queries become a lot better, in DataGrip at least. These Database clients use the Primary Key and Foreign Key information to auto complete join conditions when writing SQL queries. When you query data, you can use the context menu to select data that match a foreign key column from a query result, for example.
Metabase automatically fills in Table metadata for you if foreign and primary keys information is available, it fills out metadata about the tables. If you have that information in place, when you use Metabase query builder, it’s able to show to the end user the fields of the joined table so they can be filtered upon.

My thoughts:

Someone wanting to take advantage of that would have to use post hooks to successfully implement that. It’s a little bit clunky, but works.

Since what I talked about is mostly about documenting the relationship between models, I think the sensible place for it to be would be in the YAML files, and then we should have an option some how to generate the “alter table” statements to make those happen.

@drew Question to you: do we have access to the data in the YAML files when executing post-hooks? I thinking it would be relatively easy to create a macro that you invoke from a post-hook to parse the list of columns and generate those exact “alter table” statements.

drew · July 11, 2019, 2:51pm

Hey @thalesmello - I really like the way you’re thinking about this! I don’t believe the yaml file contents are actively available in the hook context, but this is definitely a workflow I’d like to enable.

Presumably we’d want to do something like the following:

add a primary_key test which effectively just implements unique + not_null
provide some knowledge of the existing tests for a model in its compilation context
make it easy to distinguish between views/tables in these hooks: we definitely won’t want to add these constraints on views!

Do you buy all of this?

thalesmello · July 11, 2019, 5:30pm

I do agree with the things you pointed out.

Regarding your third point, that’s something I worked around by checking

{% if this.materialized in [‘table’, ‘incremental’] %}

When I was adding indices in our old Postgres warehouse.

system · September 7, 2022, 2:22am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Help requested - constraints not appearing in generated sql? Help dbt-core	4	327	July 31, 2024
Add constraints and descriptions to table Archive	4	5211	March 3, 2021
can dbt indicate primary keys Help	3	5569	December 14, 2023
Column Descriptions and Constraints in Snowflake via DBT Help snowflake , dbt-core	0	170	October 7, 2024
Having Primary and Foreign Keys persist through a dbt run Help snowflake	0	257	August 9, 2024

Does dbt add primary/foreign key constraints?

Related topics