I’m using airbyte to sync my salesforce tables to a bigquery destination, then using dbt snapshot to snapshot the resulting bigquery table. I noticed that whenever I add new columns, I have to drop / reset / resync the salesforce table, but then that causes the dbt snapshot to consider all these records invalid at the time of resync. What are some best practices around this to avoid this issue?
When you say you’re dropping/resetting/resyncing the table, is that because Airbyte can’t make the necessary schema modifications? Or are you meaning that dbt snapshot
isn’t adding the new columns to its snapshot table?
It looks like Airbyte added support for changing schemas in 0.50: Announcing Airbyte 0.50: Checkpointing, Column Selection, and Schema Propagation | Airbyte
If your dbt snapshot’s definition uses something like the synced_at
column when determining whether a record has changed or not, then dropping and re-syncing the table will indeed make it consider the records new. You would need to change your snapshot configuration, e.g. by using the check
strategy instead of timestamp
: Snapshot configurations | dbt Developer Hub