I am bit new to DBT and we are looking to move our SQL pipeline built on R which executes a bunch of SQL scripts in a particular order to DBT. I am not sure if DBT is the right choice to move for this scenario.
Scenario :- Pipeline looks like-
SQL script 1 → script 2 → script 3 → script 4 → script 5 etc
script 1 - Creates Table A from an existing table
CREATE table A AS Select * from existing_table1;
script 2 - Creates Table B
CREATE table B AS Select * from existing_table2;
script 3 - Adds new column in Table A and add values via UPDATE statement using Table B
ALTER TABLE A ADD column p INT; UPDATE A SET p = A.col1+B.col1 FROM B;
script 4 - Creates Table C
CREATE table B AS Select * from existing_table3;
script 5 - Adds new column in Table A and add values via UPDATE statement using Table C
ALTER TABLE A ADD column q INT; UPDATE A SET q = A.col2+C.col2 FROM C;
Now in the existing pipeline, since each script is isolated SQL statements so they are simple to execute.
But in DBT, as per documentation, there can be one model file for each table and it has to single SELECT statement.
Some confusions/questions I have :-
- How can I accommodate script1, script 3 and script 5 in single SELECT statement considering there can be only one model file for Table A
- If I am able to do the above point, How will I maintain the order of execution in DBT for script 2, which needs to be executed before the UPDATE happening in Script 3