Is DBT right choice for SQL pipeline?

Hi
I am bit new to DBT and we are looking to move our SQL pipeline built on R which executes a bunch of SQL scripts in a particular order to DBT. I am not sure if DBT is the right choice to move for this scenario.

Scenario :- Pipeline looks like-

SQL script 1 → script 2 → script 3 → script 4 → script 5 etc

script 1 - Creates Table A from an existing table

CREATE table A AS Select * from existing_table1;

script 2 - Creates Table B

CREATE table B AS Select * from existing_table2;

script 3 - Adds new column in Table A and add values via UPDATE statement using Table B

ALTER TABLE A ADD column p INT; UPDATE A SET p = A.col1+B.col1 FROM B;

script 4 - Creates Table C

CREATE table B AS Select * from existing_table3;

script 5 - Adds new column in Table A and add values via UPDATE statement using Table C

ALTER TABLE A ADD column q INT; UPDATE A SET q = A.col2+C.col2 FROM C;

Now in the existing pipeline, since each script is isolated SQL statements so they are simple to execute.

But in DBT, as per documentation, there can be one model file for each table and it has to single SELECT statement.

Some confusions/questions I have :-

  1. How can I accommodate script1, script 3 and script 5 in single SELECT statement considering there can be only one model file for Table A
  2. If I am able to do the above point, How will I maintain the order of execution in DBT for script 2, which needs to be executed before the UPDATE happening in Script 3

CTEs are a best practice for those intermediate tables B and C you’re using.

Also, UPDATE ... SET ... FROM ...actually performs a JOIN afaik, but it’s not clear on what columns it’s performing, so I’d refactor those to an explicit JOIN :slightly_smiling_face:

A simpler model for A could be

-- table_a.sql
WITH

b AS (
  SELECT * FROM existing_table2
),

c AS (
  SELECT * FROM existing_table3
)

SELECT a.*,
       a.col1 + b.col1 AS p,
       a.col2 + c.col2 AS q
  FROM existing_table1 AS a
  JOIN b ON a.key = b.key
  JOIN c ON a.key = c.key
;

The above statement would have the same effect as your 5 step pipeline, I believe. With that change, it fits perfect with the dbt philosophy.

1 Like