Problems with schema

isi_m · May 29, 2024, 8:41am

The problem I’m having

Hello! I don’t have much experience with dbt. I am trying to create a table via an external table in Databricks. I have it defined in raw/source.yaml as follows:

- name: table_1
  freshness:
    warn_after: {count: 12, period: hour}
    error_after: {count: 24, period: hour}
  external:
    location: "{{'s3://datahub-' + env_var('environment') + '-raw/table_1/'}}"
    using: csv
    infer_schema: true
    partitions:
      - name: timestamp
        data_type: integer
  columns:
    - name: col_1
      data_type: string
    - name: col_2
      data_type: string
    - name: col_n
      data_type: string

The context of why I’m trying to do this

My problem is that my CSV contains more columns than I have defined in my YAML, and when I create the table using:

dbt run-operation stage_external_sources --args "{select: raw.table_1}" --vars "ext_full_refresh: true"

Data from columns that I don’t want are populating the columns that I do want.

Could you please help me fix this issue? I have tried to force schema with infer_schema(I dont know if this applies only to data_type) and I dont know to force to adapt my data to my schema defined in the yaml

brunoszdl · May 29, 2024, 1:03pm

In the docs of this macro, it says

    # Specify ALL column names + datatypes.
    # Column order must match for CSVs, column names must match for other formats.
    # Some databases support schema inference.

isi_m · June 3, 2024, 11:12pm

Hello,

Really thanks for your answer! I didn’t find that documentation in external | dbt Developer Hub
but looks like thats the right answer I mark this as solved.

Again thanks

system · June 10, 2024, 11:13pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
dbt_packages - dbt_external_tables Help snowflake , dbt-cloud	3	4509	October 25, 2022
dbt_packages- dbt_external_tables- quote keyword columns using infer_schema Help	0	655	February 27, 2024
dbt run-operation stage_external_sources causes an error Help snowflake , dbt-core	0	435	November 20, 2024
Is it possible to import a redshift table structure into a source definition (schema.yml) Help redshift	5	1101	July 3, 2023
Args for dbt_external_tables has incorrect YAML format under Airflow Help dbt-core	0	417	April 10, 2024

Problems with schema

The problem I’m having

The context of why I’m trying to do this

Related topics