Training, saving, and running machine learning workloads with dbt/Snowflake

TLDR: For those of you deploying data science models using dbt & Snowflake, which parts of the model do you run in dbt vs snowflake?


We’re working on a project that involves training a predictive machine learning model in Snowflake. We create our training dataset by transforming raw data with SQL in dbt, and then want to

  1. train an ML model on this dataset and save the model, e.g. as a pickle file
  2. separately, load the model and make predictions

We know it’s possible to run Python with dbt, and we’ve successfully trained the model in dbt. In terms of storing objects, we can think of using

  • Snowflake internal stages
  • External stages, such as S3

We’d be eager to understand whether there are any best practices for running SQL + Python workloads in Snowflake involving saving and then loading in machine learning models?