Execute DBT pipeline via plain SQL

Hello everybody.

For the moment, for the production pipeline execution, we can use only plain SQL (T-SQL).

So we need to get SQL Script, which does the same job as DBT RUN. A solution is to collect SQL statements DBT RUN issues. The additional requirement is that we want to have the Script packed into UDF (T-SQL Stored Procedure).

The current solution to get the DBT pipeline SQL Script in UDF (T-SQL Stored Procedure):

  1. Launch SQL Server Profiler with SQL Batch Starting Event. https://docs.microsoft.com/en-us/sql/relational-databases/event-classes/sql-batchstarting-event-class
  2. Execute DBT RUN command (1 thread, SQL Server development environment with administrative rights).
  3. Export SQL Statements from the Profiler to a text file.
  4. Add dynamic invocation_id generation statement to the beginning of the Script (for https://github.com/fishtown-analytics/dbt-event-logging):
    declare @invocation_id as uniqueidentifier = NEWID()
  5. Replace some statements in the Script:
    5.1. “GO” with semicolon.
    5.2. “COMMIT” with semicolon.
    5.3. User name generated by https://github.com/fishtown-analytics/dbt-event-logging with SUSER_SNAME().
    5.4 Fixed “invocation_id” with @invocation_id from (4).
  6. Remove UDFs creation statements. Using dbt to manage user defined functions
  7. Create UDF (T-SQL Stored Procedure) with the statements from the Script.

There is a number of downsides in the approach, like the loss of the multithreaded execution https://docs.getdbt.com/reference/commands/run/, the need for separate exceptions handling, etc.

Nevertheless, the solution still works pretty well, and to be more confident before going further in detailed tuning/automation, we are thinking about if there are more recommended approaches.

For example, it seems feasible to work with JSON log


–log-format JSON
to get the same DBT pipeline SQL Script via python.

Thanks in advance for the recommendations/thoughts on how it may be better to execute DBT pipeline via plain SQL.

Cheers
Sergey