question on AWS Athena and S3: use dynamic external_location in models file

I am working on a demo project on dbt with AWS Athena, I can build dbt models and working with Athena well. There are two problems:

  1. The output location in AWS s3 requires a dynamic location. In our case, we need the storage path with date, i.e. s3://mybucket/dbt/2023/01/17/demo.
  2. The generated parquet file is all good except its file name. I got something like: “20230112_023258_00015_ddqb9_6b8d55cf-02bd-4f7b-80f4-f6dc991829b3”. I would like to customize its name like: “mydemo001.parquet”.

Note: I tried but cannot find much helpful info from internet. I also tried to define various ‘vars’ but seem not working in this case.

Here are the sample code:

{{ config(materialized=‘table’, file_format=‘parquet’, external_location=‘s3://mybucket/dbt/demo’) }}

with source_data as (
select distinct emplid from {{ source(‘demo_athena’, ‘demo_table’) }}
)
select * from source_data

Hi @yzhang, I asked a couple of people who use the Athena adapter and got these pieces of feedback:

Dynamic output locations are definitely possible–my project uses them for nearly every model. I use a macro to build the external location string and then call that macro from my model config.

Athena does not provide any mechanism for naming the files it creates

If you use a macro to build the external location string, remember to not nest your curlies

1 Like

Hi can you post some example code please? I’m also trying to dynamically create the s3 location and am getting in a muddle with whether I can do this in the model config file (within the .sql) or can it be done at a global setting by setting external_location in the dbt_project.yml file - but then I want to call a macro from there. I’m also struggling as I need to collect information from the model file path in order to create the s3 location.

I have rewritten generate_schema_name.sql macro to custom name the schema, and at this point the node metadata information, including the directory filepath appears available to the macro as the node variable. But this node variable doesn’t appear to be available at the model config level.

Thank you kindly for any pointers.