Hi guys,
I am completely new to dbt and there are few things a cant get my head around.
In simple: dbt can act as a substitute for datapipeline created in Snowflake using streams/tasks etc.?
Imagine I have source, each day this system extract data to S3. My goal is to created dimensional star schema.
In Snowflake I have external tables build on top of S3, So for example customer table will look like intial_load/delta_day1/delta_day2 and so on. So it will be growing each day.
What should by my source for DIM tables? I get using snapshots. So I have also streams build on top of these external tables. And from these streams I am building snapshots.
So these stream I am using as SOURCES in dbt (for snapshots and snapshots for models down the line). Is this correct approach? Or should I use as source external tables? Or something different? What if something bad happend a source system will dump new data before snapshot creation? I will then have in stream 2 days, because stream wont be consumed. How can I how can I handle correct sequence loading into snapshot. So my history will be correct? What if I have to go back in time and regenerate the last three days?
Thank you