Need Advice on Data Loading Strategy into Data Vault Models

Sam777 · January 15, 2025, 10:44am

Hey Folks
Currently we are in the migration to Snowflake
In our older process we are in-house solutions to load the data into the data vault models.

Like we have json file which source table, target table , how to construct business key , column name mapping every like define in this json.

For example

“SALES”:
    {
        "SOURCE_SYSTEM":"SALES",
        "SOURCE_SCHEMA":"STAGE",
        "TABLE_MAPPING":
        [
            {"SOURCE_TABLE": “SALES_STAGE”,                "TARGET_SCHEMA”:”SOME_SCHEMA”, "TARGET_TABLE":"SALES_SAT },
        ],
        "COLUMN_MAPPING":
        [
              {"SOURCE_TABLE": “SALES_STAGE”,   "SOURCE_COLUMN”:”SOURCE_COLUMN”,     "TARGET_COLUMN”:”TARGET_COLUMN”},
        ],
        "BUSINESSKEY_MAPPING":
        [
            {"TARGET_COLUMN”:”SALES_HK",  "SORT_ID":"1", "BUSINESSKEY":”TARGET_COLUMN”},
        ]
    }

We load this json into the database and stored proc like dynamically generate sql script based upon this json etc.

Now we are moving to Snowflake we can’t use this stored proc method.

We thought either doing it in two approaches to generating the load scripts generated based upon json file
one using dbt transformer.
In this approach we that we can either write macro and to generate sql script from the json config or convert existing configuration into native dbt configuration itself ( also we have some custom loading behaviour )
Or
Just rewrite sql generation in python

And I know dbt provides testing , lineage , documentation etc. But we already have our know framework written in python and maintained by us for these needs.

I already in the past worked in the dbt snowflake setup one year ago .
I feel the dbt incremental load seems little complex and hard to maintain.

I feel that generating sql using python is much better way.
But I feared that maintaining such loading framework for long time is oing to be hard. and dbt is doing the same.

Can someone pls advice on this scenario

a_slack_user · January 15, 2025, 7:52pm

Looks like a simple JSON files,
Ingestion into snowflake for these would be via S3, Stage and Pipe into a Database - we have a specific database per ingestion method so have an AWD_DB. As this is a simple structure you can keep if simple and use Snowflake INFER_SCHEMA so this is flattened as it is loaded.
From these in dbt you bring these into models as Staging sources.
Once the configuration data is in a model you can then generate sql scripts using jinja
If you need to generate scripts you can do queries against the model and generate the scripts(outside dbt) and bulk load into a branch for PR.

_{Note: @Nick Pinfold originally posted this reply in Slack. It might not have transferred perfectly.}

Sam777 · January 16, 2025, 4:34am

Hey there ,

Thanks for this idea
can we create our merge / insert script has macro which is written in the stored Procedure
Because I don’t want the incremental approach which by handled by the dbt.
Since these merge/ insert statement are developed over years. these scripts are little trusted

Do u have any thought on this or suggestions

Topic		Replies	Views
Dynamic schema/model generation based on a metadata table in snowflake Help jinja , incremental , best-practice , snowflake	0	1637	March 7, 2023
Loading JSON file using dbt core In-Depth Discussions json , snowflake , dbt-core	0	1769	January 18, 2024
Incremental approach not working for non default destination schema Help jinja , incremental , snowflake , dbt-core	0	779	July 24, 2023
Snowflake Streams for incremental dbt loads Archive	0	3735	June 21, 2022
Incremental load logic failed in DBT for Initial load Help incremental , snowflake , dbt-cloud	1	91	November 19, 2024

Need Advice on Data Loading Strategy into Data Vault Models

Related topics