For the moment, for the production pipeline execution, we can use only plain SQL (T-SQL).
So we need to get SQL Script, which does the same job as DBT RUN. A solution is to collect SQL statements DBT RUN issues. The additional requirement is that we want to have the Script packed into UDF (T-SQL Stored Procedure).
The current solution to get the DBT pipeline SQL Script in UDF (T-SQL Stored Procedure):
- Launch SQL Server Profiler with SQL Batch Starting Event. https://docs.microsoft.com/en-us/sql/relational-databases/event-classes/sql-batchstarting-event-class
- Execute DBT RUN command (1 thread, SQL Server development environment with administrative rights).
- Export SQL Statements from the Profiler to a text file.
- Add dynamic invocation_id generation statement to the beginning of the Script (for https://github.com/fishtown-analytics/dbt-event-logging):
declare @invocation_id as uniqueidentifier = NEWID()
- Replace some statements in the Script:
5.1. “GO” with semicolon.
5.2. “COMMIT” with semicolon.
5.3. User name generated by https://github.com/fishtown-analytics/dbt-event-logging with
5.4 Fixed “invocation_id” with
- Remove UDFs creation statements. Using dbt to manage user defined functions
- Create UDF (T-SQL Stored Procedure) with the statements from the Script.
There is a number of downsides in the approach, like the loss of the multithreaded execution https://docs.getdbt.com/reference/commands/run/, the need for separate exceptions handling, etc.
Nevertheless, the solution still works pretty well, and to be more confident before going further in detailed tuning/automation, we are thinking about if there are more recommended approaches.
For example, it seems feasible to work with JSON log
to get the same DBT pipeline SQL Script via python.
Thanks in advance for the recommendations/thoughts on how it may be better to execute DBT pipeline via plain SQL.