Productionised scheduling of DBT Athena - AWS

oli5679 · May 26, 2023, 10:34am

The problem I’m having

I’m trying to work out best way to deploy some dbt models I’m working on to production. I wrote them using dbt and the Athena community plugin. They transform several Athena tables that I generate using a Glue crawler. My colleague has productionised The EL part (calling a bunch of APIs, validating the data and generating AVROs and running a glue crawler) using ‘serverless’ framework. I was wondering what the best way to productionise this would be. I don’t really have a budget to use managed dbt, and it would be most ideal if I can also use ‘serverless’ and configure all AWS IAM stuff in the same repo.

The context of why I’m trying to do this

Productionising dbt model.

We have different AWS projects for dev and prod, and aim to generate all AWS resources using ‘serverless’.

What I’ve already tried

Tried (1) copying dbt model into same repo as src/handlers in serverless repo
(2) making a lambda function that runs dbt using ‘subprocess’
(3) passing in aws-profile as an environment variable

but it seems a bit hacky and I’m not sure it’s the best approach.

To be clear, I already made a model that runs locally just fine by making a profiles.yml that includes an AWS profile with all the right permissions.

TomC · May 26, 2023, 10:58am

Hi there,

Running dbt via a Lambda function will probably run into some issues because Lambdas have a 15 minute maximum runtime. (See the Lambda quotas here). In my experience many dbt projects will take over 15 minutes to run, especially as your number of models starts to grow.

You might consider ECS (Elastic Container Service) as an alternative if you’re required to use “serverless” features only. It’s also possible to schedule ECS jobs.

oli5679 · May 26, 2023, 6:50pm

Thanks for the advice. I got a bit confused about how to handle dev-prod permissions.

My colleagues have setup different AWS accounts for dev and prod, and when I was developing locally, I used a service account which had the correct Athena/S3 permissions in the dev project. I am just wondering how I use the same docker image in both dev and prod AWS projects, through some configurations of profiles.yml and setting up stuff in AWS.

This is what I did but I don’t really think it’s the correct way (passed AWS keys in the sls deploy command).

profiles.yml

project-name:
outputs:
dev:
database: awsdatacatalog
region_name: eu-west-2
s3_data_dir: s3://…
s3_staging_dir: s3://…
schema: db-name
threads: 1
type: athena
aws_profile:
aws_access_key_id: “{{ env_var(‘AWS_ACCESS_KEY_ID’) }}”
aws_secret_access_key: “{{ env_var(‘AWS_SECRET_ACCESS_KEY’) }}”
target: dev

serverless.yml

resources:
Resources:
[other resources already defined]
DbtRunTaskDefinition:
Type: AWS::ECS::TaskDefinition
Properties:
[other properties]
Environment:
- Name: dbt_TARGET
Value: dev
- Name: AWS_ACCESS_KEY_ID
Value: ${env:AWS_ACCESS_KEY_ID}
- Name: AWS_SECRET_ACCESS_KEY
Value: ${env:AWS_SECRET_ACCESS_KEY}

Is there some way to make it so that there is some ARN role that means the docker image has the correct permissions, without directly passing around the keys, or some other solution to what I’m trying to achieve at a high level?

Topic		Replies	Views
dbt with AWS Athena and Python models. Help python-models	6	3393	February 6, 2023
AWS Athena in DBT Cloud In-Depth Discussions	0	1555	June 23, 2023
Dbt Athena via Eks Help dbt-core	0	591	January 9, 2024
dbt-glue with AWS Fargate Help	0	48	January 22, 2025
Model run scheduling patterns Archive	2	4889	March 5, 2020

Productionised scheduling of DBT Athena - AWS

The problem I’m having

The context of why I’m trying to do this

What I’ve already tried

Related topics