Me and my team are running our dbt project on an all purpose cluster. When running some of our larger tables we get a performance hint in the Spark UI telling us to increase the
spark.sql.shuffle.partitions setting to increase performance. I found this stack overflow answer on how to set this setting in dbt. Instead of putting this setting in the model config I added it to
dbt_project.yml as you can see below. When I ran one of the big models again I still get the performance warning so it seems that the setting did not have the effect I was hoping for.
Is there anybody here that knows how to set this?
Im not sure but I think this setting is on the cluster level in Databricks. So it does not really make sense to set this for every model.
We are running Azure Databricks and we tried this on an all purpose cluster with 96 cores and 384 Gib memory.
models: +file_format: delta +pre-hook: sql: "SET spark.sql.shuffle.partitions=auto" transaction: false