Handling unstructured data from MongoDB

Hi everyone,

Our MongoDB has unstructured data that we are bringing in as a string - such as:

{"_eventid": "1234", "Keywords":""}
{"_eventid": "4567", "Keywords":{"signup":True} }

When it comes to DBT, what is best practice?:

  1. Extract attributes as columns at the staging layer (i.e. event.data->>‘_eventid’ AS eventid)
  2. Load the data as is into the staging layer, and then extract the unstructured data

Are there any alternative/better ways of handling unstructured data? We are pulling this data into BigQuery

Thanks!

If you are following an ELT pattern, then loading into your source table as-is and then extracting the semi-structured data is what I do (I use snowflake so optimizing for cost may be different). I usually make a parsed view over the top of the data, then an incremental table off of that view.

A single json object may be normalized (broken out) into multiple models, for example and order might have a total, some customer data, and have an array of items that have prices associated with them, I would build an order object, and an order detail (which contained the times in the order) off of it.