I’ve read the forum post on why dbt is built on idempotence Understanding idempotent data transformations
But I’m curious if this is worth it in the long-run?
The example in that forum post syncs things from within the hard coded last 24 hours but that seems not smart since I would actually want a non-idempotent system to sync all rows starting from the last synced row. So that no rows would ever be missed. So I feel like the example in that post is a bit unfair.
My main question though is about the long-term aspects of using dbt which is that as my dataset grows larger and larger over time, I’ll end up having millions of rows, and so I would expect my dbt runs to get slower over time right?
Instead, if dbt only processed all new rows then it wouldn’t get slower over time.
How do you think about this trade-off? Is it okay because we expect the processing time to only increase sub-linearly with respect to how fast the number of rows increases, since sql transformations are efficient?