This discussion will be used for dbt-py model best practices. Contribute your opinions on best practices or ask us about them! We’re early on this journey. Some things you shouldn’t do:
use Python for hitting external APIs for EL tasks (caveat: light data enrichment may be okay)
…
dbt-py models, like dbt-sql models, are for transformation code – the equivalent of a select statement. You should configure your model as need, dbt.ref and dbt.source upstream data, write data transformation code, and return a data object to be persisted in the data platform at the end.
I haven’t been able to find any resources for typing Python models, which in general is good practice because linters/mypy can catch a lot of errors and you get autocomplete suggestions in your IDE of choice. Here’s a minimal example for Snowflake which does this for you:
from typing import Optional, Union
import pandas as pd
from snowflake.snowpark import DataFrame as SnowflakeDataFrame
from snowflake.snowpark.session import Session
from typing_extensions import Protocol
#: pylint: disable=invalid-name
ConfigValue = Union[str, bool, float, int]
class Config(Protocol):
"""Model configuration"""
#: pylint: disable=too-few-public-methods
@staticmethod
def get(key: str, default: Optional[ConfigValue] = None) -> ConfigValue:
"""Get the value of a keky in the config dictionary"""
class This(Protocol):
"""Reference to this model's database table"""
database: str
schema: str
identifier: str
class Dbt(Protocol):
"""DBT interface"""
config: Config
this: This
is_incremental: bool
def ref(self) -> SnowflakeDataFrame:
"""References to other models"""
def source(self) -> SnowflakeDataFrame:
"""References to sources"""
def model(dbt: Dbt, session: Session) -> pd.DataFrame:
"""Build the Python model
Parameters
----------
dbt : Dbt
DBT object with configuration and references
session : Session
Snowpark session
Returns
-------
pd.DataFrame
pandas DataFrame
"""
#: pylint: disable=unused-argument
df = dbt.ref("another_model").to_pandas()
return df
The return type of the model function can also be a Snowflake DataFrame, if you’ve not used to_pandas in your code.