Hi, I’ve just started to look at dbt as a tool, it looks very exciting but I am stumbling at an early hurdle. The assumption that dbt calls out is that you should have populated staging tables. We are running on AWS and we have existing glue ETL jobs writing datasets to s3 for our data lake. I am conflicted as to whether to modify those jobs to also write to some staging tables in redshift, or whether dbt external sources could be used so that data could be pulled into redshift via spectrum as required. It seems like the latter option would be more elegant as there are fewer moving parts and we would be using dbt to manage that dag, but just wondering if anyone in the community has tried this latter approach what their experience is, or if they have hit any snags such as performance. We are dealing with datasets that may grow buy 100,000s of rows a day, but not much bigger than that.
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Is dbt an ETL tool? | 7 | 5306 | August 24, 2018 | |
End-to-end flow for using dbt with partitioned tables in Redshift. | 1 | 1745 | December 3, 2023 | |
DIM table creation(SCD2), source selection | 0 | 918 | May 20, 2023 | |
Create schema with empty tables and redshift spectrum | 0 | 1664 | April 10, 2023 | |
How best to copy schemas in redshift? | 3 | 1428 | June 8, 2023 |