Productionised scheduling of DBT Athena - AWS

The problem I’m having

I’m trying to work out best way to deploy some dbt models I’m working on to production. I wrote them using dbt and the Athena community plugin. They transform several Athena tables that I generate using a Glue crawler. My colleague has productionised The EL part (calling a bunch of APIs, validating the data and generating AVROs and running a glue crawler) using ‘serverless’ framework. I was wondering what the best way to productionise this would be. I don’t really have a budget to use managed dbt, and it would be most ideal if I can also use ‘serverless’ and configure all AWS IAM stuff in the same repo.

The context of why I’m trying to do this

Productionising dbt model.

We have different AWS projects for dev and prod, and aim to generate all AWS resources using ‘serverless’.

What I’ve already tried

Tried (1) copying dbt model into same repo as src/handlers in serverless repo
(2) making a lambda function that runs dbt using ‘subprocess’
(3) passing in aws-profile as an environment variable

but it seems a bit hacky and I’m not sure it’s the best approach.

To be clear, I already made a model that runs locally just fine by making a profiles.yml that includes an AWS profile with all the right permissions.

Hi there,

Running dbt via a Lambda function will probably run into some issues because Lambdas have a 15 minute maximum runtime. (See the Lambda quotas here). In my experience many dbt projects will take over 15 minutes to run, especially as your number of models starts to grow.

You might consider ECS (Elastic Container Service) as an alternative if you’re required to use “serverless” features only. It’s also possible to schedule ECS jobs.

Thanks for the advice. I got a bit confused about how to handle dev-prod permissions.

My colleagues have setup different AWS accounts for dev and prod, and when I was developing locally, I used a service account which had the correct Athena/S3 permissions in the dev project. I am just wondering how I use the same docker image in both dev and prod AWS projects, through some configurations of profiles.yml and setting up stuff in AWS.

This is what I did but I don’t really think it’s the correct way (passed AWS keys in the sls deploy command).

profiles.yml

project-name:
outputs:
dev:
database: awsdatacatalog
region_name: eu-west-2
s3_data_dir: s3://…
s3_staging_dir: s3://…
schema: db-name
threads: 1
type: athena
aws_profile:
aws_access_key_id: “{{ env_var(‘AWS_ACCESS_KEY_ID’) }}”
aws_secret_access_key: “{{ env_var(‘AWS_SECRET_ACCESS_KEY’) }}”
target: dev

serverless.yml

resources:
Resources:
[other resources already defined]
DbtRunTaskDefinition:
Type: AWS::ECS::TaskDefinition
Properties:
[other properties]
Environment:
- Name: dbt_TARGET
Value: dev
- Name: AWS_ACCESS_KEY_ID
Value: ${env:AWS_ACCESS_KEY_ID}
- Name: AWS_SECRET_ACCESS_KEY
Value: ${env:AWS_SECRET_ACCESS_KEY}

Is there some way to make it so that there is some ARN role that means the docker image has the correct permissions, without directly passing around the keys, or some other solution to what I’m trying to achieve at a high level?