DBT Fal or Python with Spark (EMR)

I am trying to run Spark over JDBC using dbt-python. It seems it work only with Databricks and GCP Dataproc. We are doing a POC to see if dbt can be integrated with open source Spark using Python Models.

We are trying to do a POC for one of our clients. We have tried running dbt against locally setup Spark. We are able to run SQL models using thrift server but not Python based models (as it is only possible to run queries using Spark Thrift Server). It is asking for Databricks cluster or Dataproc cluster.

We would like to understand the roadmap of dbt-python or dbt-fal with respect to support to open source Spark, EMR, Spark on Kubernetes, etc.

1 Like

i am also trying to use dbt python model with spark on k8s, do you have any progress?