Hey Folks
Currently we are in the migration to Snowflake
In our older process we are in-house solutions to load the data into the data vault models.
Like we have json file which source table, target table , how to construct business key , column name mapping every like define in this json.
For example
“SALES”:
{
"SOURCE_SYSTEM":"SALES",
"SOURCE_SCHEMA":"STAGE",
"TABLE_MAPPING":
[
{"SOURCE_TABLE": “SALES_STAGE”, "TARGET_SCHEMA”:”SOME_SCHEMA”, "TARGET_TABLE":"SALES_SAT },
],
"COLUMN_MAPPING":
[
{"SOURCE_TABLE": “SALES_STAGE”, "SOURCE_COLUMN”:”SOURCE_COLUMN”, "TARGET_COLUMN”:”TARGET_COLUMN”},
],
"BUSINESSKEY_MAPPING":
[
{"TARGET_COLUMN”:”SALES_HK", "SORT_ID":"1", "BUSINESSKEY":”TARGET_COLUMN”},
]
}
We load this json into the database and stored proc like dynamically generate sql script based upon this json etc.
Now we are moving to Snowflake we can’t use this stored proc method.
We thought either doing it in two approaches to generating the load scripts generated based upon json file
one using dbt transformer.
In this approach we that we can either write macro and to generate sql script from the json config or convert existing configuration into native dbt configuration itself ( also we have some custom loading behaviour )
Or
Just rewrite sql generation in python
And I know dbt provides testing , lineage , documentation etc. But we already have our know framework written in python and maintained by us for these needs.
I already in the past worked in the dbt snowflake setup one year ago .
I feel the dbt incremental load seems little complex and hard to maintain.
I feel that generating sql using python is much better way.
But I feared that maintaining such loading framework for long time is oing to be hard. and dbt is doing the same.
Can someone pls advice on this scenario