Hello Everyone!!
I’m working on a use case where I need to directly interact with S3 buckets to read files, such as CSVs, Parquet files, or other data formats stored in S3. Are there any best practices or recommended approaches for integrating S3 data into dbt models and transformations?
Any insights, experiences, or recommendations would be greatly appreciated! Thank you in advance for sharing your knowledge with the community.
duckdb supports working directly with s3 and can work with dbt as well
https://duckdb.org/docs/guides/import/s3_import
Note: @Gio
originally posted this reply in Slack. It might not have transferred perfectly.
Can you please provide the example of using DuckDB with AWS and dbt?
Also, is it supported by both core and cloud versions?
You can use GitHub - dbt-athena/dbt-athena: The athena adapter plugin for dbt (https://getdbt.com) adapter for that, it supports both hive and iceberg tables currently.
Another option may be GitHub - aws-samples/dbt-glue: This repository contains de dbt-glue adapter, it may be costly but better performance for huge data volumes.
Actually it depends on your current infra and toolkit available as then some other options may be also possible.