Understanding idempotent data transformations

josh · August 18, 2019, 1:33am

Idempotence seems like a very abstract mathematical concept until you spend a while dealing with all the pain that non-idempotent ETL creates. If you are never sure if running your code and refreshing the data from scratch would actually yield the same results as doing the additional incremental load that you have written, than you are dealing with the pain that non-idempotence produces. Also, a scenario that I have run into in multiple situations is

Having a database with a bunch of tables/views/stored procedures/etc. in it
Having a source code repository with a copy of the database objects in it
Having no idea if the objects in the repository actually match the objects in the database and/or even if they do now having no system to redeploy or check in the future (a developer could make a change to the database at any time)

Having a tool like dbt which does idempotent deploys of database objects (we like putting functions, stored procs, macros, anything we might need in our database) and doing it directly from our source code repo via dbt cloud entirely solve this problem.

Topic		Replies	Views
Purpose of dbt being idempotent for when data grows large Archive	4	4767	October 7, 2021
Data idempotence Archive	3	3663	May 7, 2019
Can Dbt do this? vs using a stored procedure... In-Depth Discussions snowflake	3	2542	October 31, 2023
Handling Intermittent Test Failures in dbt Help testing , bigquery , dbt-cloud	0	73	January 17, 2025
Inserting data into Table A with sequential transformations in dbt Help incremental , modeling	3	4393	January 30, 2025

Understanding idempotent data transformations

Related topics