Using 'boto3' with Python models on Snowflake

The problem I’m having

I have boto3 installed in my virtual env but when I run dbt, it cannot find it

I am using dbt-core
pip install dbt-core dbt-snowflake
When running pip freeze | grep dbt I get

dbt-adapters==1.10.3
dbt-common==1.13.0
dbt-core==1.8.9
dbt-extractor==0.5.1
dbt-semantic-interfaces==0.5.1
dbt-snowflake==1.8.4

The context of why I’m trying to do this

I have a script in my dbt run which requires boto3 , the python script return a df which needs to be written to a table in snowflake

What I’ve already tried

I have confirmed that boto3 exists in the ve
I have confirmed that the dbt run uses the ve, I don’t have a system wide dbt-core installed, it only works when I am in the ve.

I can see the package boto3 and dbt only when in the ve

. ve/bin/active
python
import boto3
 - imports fine
exit()
dbt run

Both of these also confirms the ve is being used

which python
which dbt

Everything else works fine, it just cannot find the other packages I installed specific to this VE

I added this to the dbt_project.yml as test

on-run-start:
  - "{{ log('Python Executable: ' ~ dbt.flags.PYTHON_EXECUTABLE, info=True) }}"
  - "{{ log('Boto3 available: ' ~ ('boto3' in modules), info=True) }}"

Python Executable prints nothing
Boto3 available prints false

if I try dbt run when not in the ve I get command not found: dbt as expected
I have tried setting the PYTHONPATH to the ve’s python, but it still does the same

I have recreated the ve from scratch and still doesn’t work

Some example code or error messages

  100357 (P0000): Python Interpreter Error:
  Traceback (most recent call last):
    File "_udf_code.py", line 13, in <module>
      import boto3
  ModuleNotFoundError: No module named 'boto3'```

Additional things I tried

I found some other documentation and I tried that as well

Added this to my python file and also added the models/config.yml

def model(dbt, session):
    dbt.config(
        materialized="incremental",
        packages=["boto3==1.35.72", "botocore==1.35.72"]
    )

This had a new error

  100357 (P0000): Cannot create a Python function with the specified packages. Please check your packages specification and try again.
  compiled code at target/run/metrics/models/metrics_enquiries.py
01:08:59
01:08:59  Done. PASS=1 WARN=0 ERROR=2 SKIP=0 TOTAL=3

Hey @albert! Python models run on Snowflake’s Snowpark platform, not using your local python environment, which is why having boto3 installed locally doesn’t do anything.

Looking at the list of packages in Snowflake’s conda channel, boto3 and botocore are both available at version 1.34.154.

If you request those versions of the packages in your model’s configs, I expect they will install correctly.

Hi @joellabes

Thanks for that, I didn’t realise that the version number had to match that page

Does this mean it downloads the package each time it runs?

I now got past that issue, but now I have a new one
How do I pass my aws creds? I am using boto3 to read an AWS secret, but even with my credentials in my terminal, I just get

  Exception: Error retrieving secret: Unable to locate credentials

Another question, why can I not print or log anything out from these jobs. I have tried everything including writing to a file, and I cannot see anything. Is this expected behavior

Behind the scenes, dbt creates and executes an anonymous sproc. I’m not a Snowpark expert but I expect that the machines that execute those jobs already have all of those conda packages installed, so I doubt it has to go out to a package registry and wait for it to download every time.

I think you’ll want to use secrets and external access integrations. This is the sample code on the Python models doc

import pandas
import snowflake.snowpark as snowpark

def model(dbt, session: snowpark.Session):
    dbt.config(
        materialized="table",
        secrets={"secret_variable_name": "test_secret"},
        external_access_integrations=["test_external_access_integration"],
    )
    import _snowflake
    return session.create_dataframe(
        pandas.DataFrame(
            [{"secret_value": _snowflake.get_generic_secret_string('secret_variable_name')}]
        )
    )

There is also a worked example on our developer blog:

This (unfortunately!) is a limitation in Snowpark. There used to be no logging support at all, but that’s improved a bit recently and there’s some example code for logging here: [ADAP-1046] [CT-3418] [Bug] dbt python model logging not working with snowflake · Issue #846 · dbt-labs/dbt-snowflake · GitHub

I just wanted to comment on this as well, to make sure your mental model is correct. All Python execution is happening on a remote Snowflake server - you definitely will not see things being written to your local disk because they’re not being executed on your local disk.

Thanks so much for all the assistance

It all makes sense now. I I didn’t realise that the job is actually not happening on my local machine/ecs task.

I assumed that was how dbt-cloud worked, and dbt-core was all localised

1 Like

When you’re running dbt core locally, it will compile commands to send to Snowflake (e.g CREATE TABLE AS SELECT) based on the contents of your models directory etc. you can see what dbt compiles in the target directory.

But then dbt core (and Cloud) send the compiled commands off to Snowflake which does all of the compute work.

dbt is responsible for making sure everything runs in the right order but is essentially a framework for making it easier to leverage your warehouse’s capabilities

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.