The problem I’m having
I try to generate a record with default values in my dimensions. Unfortunately, I can’t seem to be able to do this without building the dimension twice (this is required in the example below) or run to a cycle exception.
The context of why I’m trying to do this
My customer requires that every dimension has a dummy record to prevent ‘left join’ in reporting queries. Also this should make it easier for developers to add code in an automated way (less error prone).
What I’ve already tried
The macro below is what I have tried. This generates a union that selects all the required columns from the relation (in this example ‘this’ is used). All fields are default values based on a set of rules. The way I understand the keyword ‘this’ is that it looks at an already existing table. I have also used ref() to try this, but that gives me a cycle exception.
Some example code or error messages
{%- macro get_default_value(column) -%}
{%- if column.is_string() and not (column.name|lower).endswith("_sk") -%}
'N/A'
{%- elif column.is_string() and (column.name|lower).endswith("_sk") -%}
{{ dbt_utils.generate_surrogate_key(["'-1'"]) }}
{%- elif (column.is_number() or column.is_float()) and (column.name|lower).endswith("_id") -%}
'-1'
{%- elif (column.is_number() or column.is_float()) and not (column.name|lower).endswith("_id") -%}
'0'
{%- elif (column.data_type|lower) in ["timestamp", "timestamp_ntz", "timestamp_ltz", "date"] and "a_date_i" not in (column.name|lower) and "a_date_u" not in (column.name|lower) -%}
'1900-01-01'::timestamp_ntz
{%- elif (column.data_type|lower) in ["timestamp", "timestamp_ntz", "timestamp_ltz", "date"] and ("a_date_i" in (column.name|lower) or "a_date_u" in (column.name|lower)) -%}
current_timestamp
{%- elif (column.data_type|lower) == "boolean" -%}
FALSE
{%- else -%}
NULL
{%- endif -%}
{%- endmacro -%}
{%- macro dummy_data() %}
{%- set all_columns = adapter.get_columns_in_relation(this) -%}
{% if all_columns %}
union select
{% for column in all_columns %}
{%- if not loop.first %},{% endif -%}
{{ get_default_value(column) }} as {{ column.name }}
{% endfor %}
{% endif %}
{% endmacro -%}
How it is used:
with
current_data as (
select aggregate_type, max(a_date_file) as a_date_file
from {{ ref("wms_event_log") }}
group by aggregate_type
),
unit_types as (
select distinct
aggregate_type,
a_date_file
from {{ ref("wms_event_log") }}
)
select
{{ dbt_utils.generate_surrogate_key(["unit_types.aggregate_type"]) }} as eut_d_sk,
unit_types.aggregate_type as eut_event_unit_type,
current_timestamp as eut_a_date_i,
current_timestamp as eut_a_date_u,
unit_types.a_date_file as eut_change_date
from unit_types
inner join current_data
on unit_types.aggregate_type = current_data.aggregate_type
and unit_types.a_date_file = current_data.a_date_file
{% if is_incremental() %}
where {{ batch_date_validation(["eut_change_date"]) }}
{% endif %}
{{ dummy_data() }}