Inserting data into Table A with sequential transformations in dbt

Hi all,
I’m facing a unique challenge with data insertion and transformation using dbt. Here’s the situation:

Step 1: I have raw data that requires some transformations, and the result needs to be inserted into Table A.
Step 2: Once Step 1 is complete, I need to perform additional transformations using the data from Table A and then insert the output of Step 2 back into Table A.

I’m seeking some guidance on the best approach to handle this scenario in dbt. If any of you have encountered a similar situation or have insights to share, I’d greatly appreciate your input. Thanks for your time!

Hello, Ajinkya.

The way that I would approach this is using common table expressions (CTEs) within a single dbt model. Pseudocode is included beneath.

Please note that I am assuming here that when you say that you want to ‘insert’ data into Table A, this would be in the form of new columns, so that the user can see which data has or has not been transformed.

with original as  (SELECT * from {{ref(source)}},

first_transformation as (SELECT 
# This preserves the original data from Table A
*,
avg(sales) over (partition by region order by 1) as sales_regional_average
# Further transformed columns here
from original),

second_transformation as (select *,
# Here, the transformed data is further altered. I would usually do this all in one line, but separating it out here for example.
round(sales_regional_average, 1) as sales_regional_average_rounded)

select * from second_transformation

I also assume that you do not need to have first_transformation as its own model. If you do, split the steps above across different .sql files/models.

Thank you for the suggestion,
I would like to mention that ‘insert’ data into Table A, this would be in the form of new rows not in the form of columns.

below is the detail scenario:

Table A is responsible for maintaining historical data, currently consisting of approximately 750 million records.

Step 1: In this step, we perform an insertion of 100 new rows into TABLE A.

Step 2: After completing Step 1, we utilize the updated TABLE A, which now includes the 100 newly inserted records from Step 1. Using this updated data, we recalculate and identify any missing data (gaps) within TABLE A. Subsequently, we insert the recalculated missing data into TABLE A.

In summary, the process involves two steps: first inserting 100 new records, and then using these new records to recalculate and fill in any gaps in the historical data within TABLE A.