Hello I am trying to make a dbt pipeline where I use parquet files as a datasource. As there is no dbt-parquet package, I think the best dbt package to use for this is dbt-duckdb as duckdb also supports the reading of parquet files. My goal is also to be able to read from a s3 bucket where the files are stored but if someone can help already help me with running it locally that would help me big time. But now I always encounter the error IO Error: The file ".../sources/energy.parquet" exists, but it is not a valid DuckDB database file!
. I get this error when running dbt debug
My profiles.yml looks like the following.
transform_dbt:
outputs:
dev:
type: duckdb
path: ./sources/energy.parquet
extensions:
- httpfs
- parquet
settings:
s3_region: my-aws-region
s3_access_key_id: "{{ env_var('S3_ACCESS_KEY_ID') }}"
s3_secret_access_key: "{{ env_var('S3_SECRET_ACCESS_KEY') }}"
target: dev
version: 2
sources:
- name: s3
schema: energy
tables:
- name: energy
identifier: s3://bucket/energy.parquet
This is my sources.yml file.
# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: transform_dbt
version: '1.0.0'
config-version: 2
vars:
db_name: energy.parquet
# This setting configures which "profile" dbt uses for this project.
profile: 'transform_dbt'
# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]
target-path: "target" # directory which will store compiled SQL files
clean-targets: # directories to be removed by `dbt clean`
- "target"
- "dbt_packages"
# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models
# In this example config, we tell dbt to build all models in the example/
# directory as views. These settings can be overridden in the individual model
# files using the `{{ config(...) }}` macro.
models:
transform_dbt:
example:
materialized: table
And this is my dbt_project.yml, I am new to dbt so any help is greatly appreciated. As I said my goal is to be able to read from my s3 bucket but if anyone can help with reading from a local file that would help me very much.