Hi all,
I have recently began using dbt-core
for data processing at our company. When working with raw or evolving datasets, I often find it easier to perform initial data exploration and preprocessing in Python before structuring transformations in dbt. This helps in understanding the data better before defining dbt models. However, I wonder if this hybrid approach is the best practice or if there are better ways to integrate Python into a dbt workflow.
Some specific questions I have:
- Is it common to use Python for raw data preparation before feeding it into dbt?
- How do teams typically manage schema inference when working with unknown datasets?
- Are there recommended ways to combine dbt’s SQL-based transformations with Python-based processing (e.g., via dbt Python models, external preprocessing scripts, or other tools)?
Would love to hear how others balance Python and dbt in their workflows!