We are using Databricks and would like to implement all data transformations—both streaming and batch—entirely within dbt.
Currently, dbt supports Streaming Tables in Databricks, which internally create Delta Live Tables (DLT).
Additionally, as per Databricks documentation, it is possible to write stream output to a Kafka topic using writeStream
in a Delta Live Tables pipeline (Reference). The outlined approach includes:
- Setting up Kafka configurations (broker URL, topic, security settings).
- Creating a DLT pipeline.
- Defining a streaming source (files, Delta tables, etc.).
- Using
writeStream
with Kafka options to publish the data.
A newly introduced feature, Delta Live Tables Sink API, now enables writing to Azure Event Hubs and Kafka directly (Reference). This expands DLT’s capabilities for seamless integration with external event streaming platforms.
“The introduction of new Sinks API in DLT addresses this by enabling users to write processed data to external event streams, such as Apache Kafka, Azure Event Hubs, as well as writing to a Delta Table.”
These features are currently in Public Preview, with plans for further expansion.
Question:
Has anyone implemented a custom dbt materialization using the DLT Sink API to enable writing to Kafka or Event Hubs?
We are exploring the possibility of using DLT Sinks as a new materialization type in dbt and would appreciate any insights, experiences, or best practices from the community.