How do you deal with missing dimensions for foreign keys?

Kimcha · June 15, 2020, 5:57pm

Hi,

What’s the best practice in DBT for dealing with missing dimensions for foreign keys?

For example, I have a fact_orders that contains a foreign key to a dim_funnel. The rows in dim_funnel have to be maintained manually, so it can frequently happen that orders happen on a funnel, but the dimension row for the funnel hasn’t been added yet.

In a traditional DWH model we would include a special row like “Missing Funnel” with a -1 key for example. And then in the fact table query we would check if the correct foreign key exists in the dimension table and if not set it it to -1.

A similar situation can happen when dimension records are deleted, but the facts referencing it are kept.

I guess when generating the fact table, I could do a join on the dimension to check if the key is there, but this prevents one of primary benefits of using hash keys-- being able to load dimensions and facts in parallel.

So I am just wondering how you guys are handling such a situation.

Thanks, Kim

josh · June 15, 2020, 6:14pm

We still create a “-1” “Unknown” row in our dimensions. Although it’s not actually “-1” literally, it’s the hash of “-1”. And then we have a variable in our dbt_project.yml file like this:

unknown_key: "CAST(FARM_FINGERPRINT('-1') AS INT64)"

Then when we build our fact tables and need to populate the keys to join to the dimension tables, we do something like this:

COALESCE(pers.person_key, {{ var('unknown_key') }} ) AS person_key,

I never really thought of one of the primary benefits of hash keys as being able to load dimensions and facts in parallel per se.

Kimcha · June 15, 2020, 6:28pm

Fantastic, that makes a lot of sense. Thank you for sharing.

And in this example pers is the joined dimension table, right?

And how do you “append” your -1 dimension row in the dimension?

Do you just add a union at the beginning of the dim table?

josh · June 15, 2020, 11:33pm

Yes, in this case pers is the joined dimension table. And yes we just do a UNION with the query that is producing the dimension to append this row.

fabrice.etanchaud · June 22, 2020, 4:38pm

Hi @Kimcha, there is an interesting article from Roleant Vos on the subject (in the data vault realm) : unknown-keys-zero-keys-or-ghost-keys-in-hubs-for-dv2-0, with a taxonomy on missing values. In my case, we use two distinct missing key values : one when the key is missing but mandatory (problem in the loading or in source data), and another when the key is missing but optional (loading and data ok but data is missing for ‘business’ reason).

Best,
Fabrice

Topic		Replies	Views
How to add surrogate key from dimension table to fact table? Archive	1	4680	August 7, 2022
Building a Kimball dimensional model with dbt In-Depth Discussions devblog	9	3809	October 16, 2023
Is there a design pattern similar to inferred member dimensions for handling late arriving dimensions when loading facts Help best-practice	1	1278	February 8, 2024
Dimensional modeling, adding foreign keys unavoidable duplication of joins? In-Depth Discussions	1	1227	April 20, 2024
Surrogate key for multiple-source dimension table Archive	2	2905	May 25, 2022

How do you deal with missing dimensions for foreign keys?

Related topics