Hello everyone. I’m using the community athena version to connect to my athena deployment. I’m just getting started with dbt. I’m trying to add a python model to my project, while I already have SQL models. When dbt run gets to my python model it runs into an error (pasted in the Logs section).
What I’ve already tried
I tried following the directions here(dbt with AWS Athena and Python models.), but I’m getting an error.
I have dbt-fal (v 1.4.7) and dbt-athena-community (v1.4.3) installed in my conda environment
Logs
...
e[0m20:18:41.563306 [debug] [Thread-2 (]: Began running node model.my_profile.py_model
e[0m20:18:41.564643 [info ] [Thread-2 (]: 1 of 1 START python table model athenadb.py_model .......................... [RUN]
e[0m20:18:41.566270 [debug] [Thread-2 (]: Acquiring new athena connection 'model.my_profile.py_model'
e[0m20:18:41.567528 [debug] [Thread-2 (]: Began compiling node model.my_profile.py_model
e[0m20:18:41.587738 [debug] [Thread-2 (]: Writing injected SQL for node "model.my_profile.py_model"
e[0m20:18:41.617119 [debug] [Thread-2 (]: Timing info for model.my_profile.py_model (compile): 2023-04-25 20:18:41.568754 => 2023-04-25 20:18:41.616997
e[0m20:18:41.618400 [debug] [Thread-2 (]: Began executing node model.my_profile.py_model
e[0m20:18:41.629555 [debug] [Thread-2 (]: Writing runtime python for node "model.my_profile.py_model"
e[0m20:18:41.731299 [debug] [Thread-2 (]: On "model.my_profile.py_model": cache miss for schema "awsdatacatalog.awsdatacatalog", this is inefficient
e[0m20:18:41.733655 [debug] [Thread-2 (]: Opening a new connection, currently in state closed
e[0m20:18:41.912887 [debug] [Thread-2 (]: Athena adapter: Schema 'awsdatacatalog' does not exist - Ignoring: An error occurred (EntityNotFoundException) when calling the GetTables operation: Database awsdatacatalog not found.
e[0m20:18:41.915321 [debug] [Thread-2 (]: with database=awsdatacatalog, schema=awsdatacatalog, relations=[]
e[0m20:18:43.546599 [debug] [Thread-2 (]: Timing info for model.my_profile.py_model (execute): 2023-04-25 20:18:41.619410 => 2023-04-25 20:18:43.546518
e[0m20:18:43.548160 [debug] [Thread-2 (]: On model.my_profile.py_model: Close
e[0m20:18:43.550029 [error] [Thread-2 (]: e[31mUnhandled error while executing target/run/my_profile/models/py_model.pye[0m
(pyathena.error.OperationalError) [ErrorCategory:USER_ERROR, ErrorCode:SYNTAX_ERROR], Detail:FAILED: SemanticException [Error 10072]: Database does not exist: awsdatacatalog
[SQL:
CREATE EXTERNAL TABLE awsdatacatalog.dbt_fal_temp_athenadb (
state_code STRING
)
LOCATION 's3://s3-pathawsdatacatalog/dbt_fal_temp_athenadb/'
]
(Background on this error at: https://sqlalche.me/e/14/e3q8)
...
File "/home/{conda-env-path}/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
cursor.execute(statement, parameters)
File "/home/{conda-env-path}/lib/python3.10/site-packages/pyathena/cursor.py", line 117, in execute
raise OperationalError(query_execution.state_change_reason)
sqlalchemy.exc.OperationalError: (pyathena.error.OperationalError) [ErrorCategory:USER_ERROR, ErrorCode:SYNTAX_ERROR], Detail:FAILED: SemanticException [Error 10072]: Database does not exist: awsdatacatalog
Sample Code
profiles.yml
my_profile:
target: dev_py
outputs:
dev_py:
type: fal
db_profile: dev_athena
dev_athena:
type: athena
s3_staging_dir: s3://s3-path
region_name: region
database: awsdatacatalog
schema: athenadb
aws_profile_name: something
py_model.py
import pandas as pd
def model(dbt, fal):
dbt.config(
materialized="table",
packages = ["pandas==1.5.3"]
)
data = {'state_code': ['NJ', 'NY', 'PA', 'NJ', 'NY']}
df = pd.DataFrame(data)
return df