Analytics on data provided by a API

The problem I’m having

Hello folks,
I want to start a small hobby project. My goal is to collect financial data from a few APIs in a database and then make them evaluable with dbt and Lightdash.
So far, I have no experience with dbt and for databases I have always been on relational databases. The JSON data from the API is nested.

From everything I have read so far, I have to transform the JSON objects into a relational DB to make them evaluable with dbt. Is that correct? Or can I also store the JSON data in a MongoDB for example and then use dbt?

If I should transform data: Which Python libraries are suitable for this? Is it worth taking a closer look at Apache Airflow? I would like to avoid having to set up a relational data model myself. That can be very time-consuming.

I know the questions are very general. But I’m looking for a certain direction from practitioners. When googling for keywords, it is always overwhelming.

1 Like

Welcome! This sounds fun :tada:

You don’t have to transform (“unnest”) the data before loading it into a relational db, but you do need to load it into a database that dbt has an adapter for. That link is for “verified” adapters, but if you’re not going to use a cloud DWH then I would suggest checking out the community-supported DuckDB adapter too - DuckDB can run locally and has pretty good JSON support.

You could store the records in an unprocessed format in your DuckDB database, and then from there you can build dbt models on top which extract the specific values from the JSON object as needed. dbt will run create table as select commands which means you don’t need to specify your table shapes in advance.

Does that get you on the right track?