Understanding idempotent data transformations

Idempotence seems like a very abstract mathematical concept until you spend a while dealing with all the pain that non-idempotent ETL creates. If you are never sure if running your code and refreshing the data from scratch would actually yield the same results as doing the additional incremental load that you have written, than you are dealing with the pain that non-idempotence produces. Also, a scenario that I have run into in multiple situations is

  1. Having a database with a bunch of tables/views/stored procedures/etc. in it
  2. Having a source code repository with a copy of the database objects in it
  3. Having no idea if the objects in the repository actually match the objects in the database and/or even if they do now having no system to redeploy or check in the future (a developer could make a change to the database at any time)

Having a tool like dbt which does idempotent deploys of database objects (we like putting functions, stored procs, macros, anything we might need in our database) and doing it directly from our source code repo via dbt cloud entirely solve this problem.

2 Likes