The problem I’m having
Hello! I don’t have much experience with dbt. I am trying to create a table via an external table in Databricks. I have it defined in raw/source.yaml
as follows:
- name: table_1
freshness:
warn_after: {count: 12, period: hour}
error_after: {count: 24, period: hour}
external:
location: "{{'s3://datahub-' + env_var('environment') + '-raw/table_1/'}}"
using: csv
infer_schema: true
partitions:
- name: timestamp
data_type: integer
columns:
- name: col_1
data_type: string
- name: col_2
data_type: string
- name: col_n
data_type: string
The context of why I’m trying to do this
My problem is that my CSV contains more columns than I have defined in my YAML, and when I create the table using:
dbt run-operation stage_external_sources --args "{select: raw.table_1}" --vars "ext_full_refresh: true"
Data from columns that I don’t want are populating the columns that I do want.
Could you please help me fix this issue? I have tried to force schema with infer_schema(I dont know if this applies only to data_type) and I dont know to force to adapt my data to my schema defined in the yaml