dbt-external-tables with Snowflake S3 stage - what will it do?

The problem I’m having

I’m looking for more detail of what this package actually does, specifically if I just define an external stage as a source.

Let’s say I define this source:

      - name: people_csv_unpartitioned
        external: &csv-people
          location: '@{{ target.schema }}.dbt_external_tables_testing/csv'
          file_format: '( type = csv skip_header = 1 )'
        columns: &cols-of-the-people
          - name: id
            data_type: int
          - name: first_name
            data_type: varchar(64)
          - name: last_name
            data_type: varchar(64)
          - name: email
            data_type: varchar(64)
        tests: &equal-to-the-people
          - dbt_utils.equality:
              compare_model: ref('people')
              compare_columns:
                - id
                - first_name
                - last_name
                - email

The readme says:

This package provides:

  • Macros to create/replace external tables and refresh their partitions, using the metadata provided in your .yml file source definitions
  • Snowflake-specific macros to create, backfill, and refresh snowpipes, using the same metadata

So I think this means that the package will do nothing, unless I also want to set up external tables or snowpipes derived from the stage. Which is fine. And if I did want those then the package provides helpers to make that happen?

I just want to understand if say I just want to define a model that reads directly from the stage then I don’t need this package at all? Or if I did add it and specify some stage properties, without using the table/snowpipe macros, then it wouldn’t mess with the stage at all?

Because you have to execute the stage_external_sources macro to get any actual work done with the package, you are correct in it not doing anything when you define external tables in your yamls.

If you just want to query the stage directly:
• You don’t need this package at all
• You can define a model like select * from '@{{ target.schema }}.dbt_external_tables_testing/csv'
• The external properties in your source definition are just metadata
If you add the package but don’t run macros:
• The package won’t touch your stage at all
• The stage_external_sources macro only processes source nodes with external properties when run
• Your stage remains untouched until you run dbt run-operation stage_external_sources

Note: @Abigail Green (CHG Healthcare) originally posted this reply in Slack. It might not have transferred perfectly.

Thanks that is helpful to know

Finally… am I right in understanding I can define sources like:

version: 2
sources:
  - name: raw_data
    tables:
      - name: '@my_stage'

and use them from models via

select * from {{ source('raw_data', '@my_stage') }}/csv

?

I think that pattern resolves to something like this so it won’t work: "database"."schema"."@my_stage"/csv

If you want to define a proper source for it, I believe it should look like this. There’s a<https://github.com/dbt-labs/dbt-external-tables/blob/main/sample_sources/snowflake.yml| sample yaml >file in the github. But you’ll also need to run the macro as mentioned above.

sources:
  - name: raw_data
    tables:
      - name: people_data
        external:
          location: '@my_stage/csv'
          file_format: '(type = csv skip_header = 1)'
        columns:
          - name: id
            data_type: int```
If you want to reference the external table directly, you can do a direct reference to it in sql and not worry about the DBT syntax.
```-- models/staging/stg_my_data.sql
SELECT * FROM @my_stage/csv```
This <https://stackoverflow.com/questions/76155101/dbt-snowflake-reference-internal-stage-as-source|stackoverflow discussion> might also be viable. It's a bit different.

<sub>Note: `@Abigail Green (CHG Healthcare)` originally [posted this reply in Slack](https://getdbt.slack.com/archives/CBSQTAPLG/p1752866403499239?thread_ts=1752677535.107909&cid=CBSQTAPLG). It might not have transferred perfectly.</sub>

ah ok

so if I go all in with dbt-external-tables and define an actual external table from the stage, that external table then becomes a source I can reference in my models

otherwise I should not define the stage as a source and just refer to the source via its raw sql identifier in the model code