I’m using a custom adapter for azure synapse spark (livy Connection) that connect with a appregistration.
Adapter is based on Spark adapter.
Today we got a azure synapse instance that is NOT managed vnet connected and a vnet connected datalake (storage account) that uses private endpoint to make them not public accessible.
With this setup dbt calls spark pool and uses internet to connect to storage account, which is not allowed since storage account is privitiezed.
I know that in synapse spark notebook I can configure spark pool to use a synapse linked service that uses a Integration Runtime to communicate with our storage account.
See sample code
# Function for setting spark config to linked service for a specific storage account given as a parameter
def set_spark_conf_to_storage_account(storage_account):
source_full_storage_account_name = f"saxxxxxx{storage_account}.dfs.core.windows.net"
spark.conf.set(f"spark.storage.synapse.{source_full_storage_account_name}.linkedServiceName", f"ls_xxxxxx_{storage_account}_xxxxxx")
sc._jsc.hadoopConfiguration().set(f"fs.azure.account.oauth.provider.type.{source_full_storage_account_name}", "com.microsoft.azure.synapse.tokenlibrary.LinkedServiceBasedTokenProvider")
In Notebook we can now connect to spark pool via Linked Service, that uses a Integration Runtime.
How would I be able to add such a setting to the spark pool in dbt?
Thanks for any advice pointing me in the right direction!
Roger