Hello everybody.
For the moment, for the production pipeline execution, we can use only plain SQL (T-SQL).
So we need to get SQL Script, which does the same job as DBT RUN. A solution is to collect SQL statements DBT RUN issues. The additional requirement is that we want to have the Script packed into UDF (T-SQL Stored Procedure).
The current solution to get the DBT pipeline SQL Script in UDF (T-SQL Stored Procedure):
- Launch SQL Server Profiler with SQL Batch Starting Event. https://docs.microsoft.com/en-us/sql/relational-databases/event-classes/sql-batchstarting-event-class
- Execute DBT RUN command (1 thread, SQL Server development environment with administrative rights).
- Export SQL Statements from the Profiler to a text file.
- Add dynamic invocation_id generation statement to the beginning of the Script (for GitHub - dbt-labs/dbt-event-logging: a dbt package to make auditing dbt runs easy.):
declare @invocation_id as uniqueidentifier = NEWID()
- Replace some statements in the Script:
5.1. “GO” with semicolon.
5.2. “COMMIT” with semicolon.
5.3. User name generated by GitHub - dbt-labs/dbt-event-logging: a dbt package to make auditing dbt runs easy. withSUSER_SNAME()
.
5.4 Fixed “invocation_id” with@invocation_id
from (4). - Remove UDFs creation statements. Using dbt to manage user defined functions
- Create UDF (T-SQL Stored Procedure) with the statements from the Script.
There is a number of downsides in the approach, like the loss of the multithreaded execution run | dbt Docs, the need for separate exceptions handling, etc.
Nevertheless, the solution still works pretty well, and to be more confident before going further in detailed tuning/automation, we are thinking about if there are more recommended approaches.
For example, it seems feasible to work with JSON log
–log-format JSON
to get the same DBT pipeline SQL Script via python.
Thanks in advance for the recommendations/thoughts on how it may be better to execute DBT pipeline via plain SQL.
Cheers
Sergey