The problem I’m having
I run dbt with glue connector as scheduled task in AWS Fargate. I have a test project which populates small piece of data in seed and runs a couple of models.
I am aware that seeds are not meant to be used as data source, but it’s just for testing.
The container runs:
dbt list → Project is found
dbt debug → No problems found, connection is OK
dbt build → An error occurs. The error message is very cryptic
Found 8 models, 2 seeds, 0 sources, 0 exposures, 0 metrics, 480 macros, 0 groups, 0 semantic models
Concurrency: 1 threads (target='default')
1 of 10 START seed file smartanalytics-jkral-warehouse_seeds.oot_store_sample .. [RUN]
Glue adapter: Glue returned `error` for statement None for code
csv = [{"PK": "Oot#3STEST642789391", .... <long text here> }]
df = spark.createDataFrame(csv)
table_name = 'smartanalytics-jkral-warehouse_seeds.oot_store_sample'
if (spark.sql("show tables in smartanalytics-jkral-warehouse_seeds").where("tableName == 'oot_store_sample'").count() > 0):
df.write .mode("overwrite") .format("parquet") .insertInto(table_name, overwrite=True)
else:
df.write.option("path", "s3://smartanalytics-jkral-sawarehousebucketbd9ba2ed-ocjsa1nnfzif/smartanalytics-jkral-warehouse_seeds/oot_store_sample") .format("parquet") .saveAsTable(table_name)
SqlWrapper2.execute("""select * from smartanalytics-jkral-warehouse_seeds.oot_store_sample limit 1""")
, NameError: name 'null' is not defined
The context of why I’m trying to do this
I have a container with dbt scheduled to run as AWS Fargate Scheduled Task. I want to use Glue for processing. I want to use iceberg table format. I want to store my data in S3 bucket.
myproject:
outputs:
default:
type: glue
glue_version: "3.0"
query-comment: DBT model
role_arn: "{{ env_var('DBT_GLUE_ROLE') }}"
region: "{{ env_var('AWS_REGION') }}"
location: "s3://{{ env_var('DBT_BUCKET_NAME') }}"
schema: "{{ env_var('DBT_SCHEMA') }}"
database: "{{ env_var('DBT_SCHEMA') }}"
session_provisioning_timeout_in_seconds: 120
workers: 2
worker_type: G.1X
idle_timeout: 5
datalake_formats: iceberg
tags: "{{ env_var('DBT_JOB_TAGS') }}"
target: default
What I’ve already tried
- Check if the role has access to S3 and Glue Catalog. I tried to explicitly add
CreateTable
policy which had no effect.
const glueJobPolicy = new Policy(this, `${id}-PassRolePolicy`, {
statements: [
new PolicyStatement({
effect: Effect.ALLOW,
actions: ['iam:PassRole', 'lakeformation:BatchGrantPermissions'],
resources: ['*']
}),
new PolicyStatement({
actions: ['glue:*', 's3:*', 'glue:CreateTable'],
resources: ['*']
})
]
})
- Check name of S3 bucket / glue database → all are correct