Handling unstructured data from MongoDB

tzkabir · February 1, 2020, 10:08pm

Hi everyone,

Our MongoDB has unstructured data that we are bringing in as a string - such as:

{"_eventid": "1234", "Keywords":""}
{"_eventid": "4567", "Keywords":{"signup":True} }

When it comes to DBT, what is best practice?:

Extract attributes as columns at the staging layer (i.e. event.data->>‘_eventid’ AS eventid)
Load the data as is into the staging layer, and then extract the unstructured data

Are there any alternative/better ways of handling unstructured data? We are pulling this data into BigQuery

Thanks!

acunningham · February 4, 2020, 1:10am

If you are following an ELT pattern, then loading into your source table as-is and then extracting the semi-structured data is what I do (I use snowflake so optimizing for cost may be different). I usually make a parsed view over the top of the data, then an incremental table off of that view.

A single json object may be normalized (broken out) into multiple models, for example and order might have a total, some customer data, and have an array of items that have prices associated with them, I would build an order object, and an order detail (which contained the times in the order) off of it.

Topic		Replies	Views
Analytics on data provided by a API In-Depth Discussions json , api	1	1357	June 14, 2023
Transform complex json using dbt Help snowflake , dbt-core	1	2599	March 11, 2024
Parsing Json schema using DBT & Snowflake Help	4	4865	April 7, 2023
Handling BigQuery Incremental __dbt_tmp tables Help incremental , best-practice , bigquery	2	5938	March 29, 2023
Loading JSON file using dbt core In-Depth Discussions json , snowflake , dbt-core	0	1769	January 18, 2024

Handling unstructured data from MongoDB

Related topics