The problem I’m having
I am trying to run a python model that makes use of facebook’s prophet library to run time series prediction on one of my tables and write the predictions into a new table.
The script itself is stratight-forward but I cannot get prophet installed when I run this via dbt cloud on GCP DataProc Serverless (batches).
It does not connect to the public internet and hence cannot install packages that go beyond standard ones like numpy and pandas.
The context of why I’m trying to do this
I want to use forecasting methods that are currently not available in SQL and bigquery and that go beyond sklearn such as facebook prophet or Uber’s orbit package.
What I’ve already tried
I have already tried to create a Private Artifact Registry (PyPI) and point dbt both in dbt_project.yml and in the model itself to use this registry explicitly but it seems that this is ignored and it tries pulling from public pypi instead.
dbt_project.yml
forecasts:
+schema: forecasts
+python_version: "3.10"
+pip_index_url: "https://europe-west1-python.pkg.dev/my-warehouse/pypi-ml/simple"
+pip_packages:
- pandas==2.2.2
- "numpy<2.0"
- holidays>=0.25
- prophet==1.1.5
- "cmdstanpy<1.2"
and my_model.py
def model(dbt, session):
# ---- dbt config ----
dbt.config(
materialized="table",
python_version="3.10",
pip_index_url="https://europe-west1-python.pkg.dev/my-warehouse/pypi-ml/simple",
pip_packages=[
"keyring>=24",
"keyrings.google-artifactregistry-auth>=1.1.2",
"pandas==2.2.2",
"numpy<2.0",
"holidays>=0.25",
"cmdstanpy<1.2",
"prophet==1.1.5",
],
)
- ChatGPT suggests to use a customer docker image with the necessary python packages installed but I cannot find a setting on dbt cloud to use that image.
As I am using dbt cloud I do not have a profiles.yml where I could specify that.
I am sure that other people must have run into the same issues trying to use Third-party Python packages in their models.