dbt Python model (dbt-py) best practices

This discussion will be used for dbt-py model best practices. Contribute your opinions on best practices or ask us about them! We’re early on this journey. Some things you shouldn’t do:

  • use Python for hitting external APIs for EL tasks (caveat: light data enrichment may be okay)

dbt-py models, like dbt-sql models, are for transformation code – the equivalent of a select statement. You should configure your model as need, dbt.ref and dbt.source upstream data, write data transformation code, and return a data object to be persisted in the data platform at the end.

I haven’t been able to find any resources for typing Python models, which in general is good practice because linters/mypy can catch a lot of errors and you get autocomplete suggestions in your IDE of choice. Here’s a minimal example for Snowflake which does this for you:

from typing import Optional, Union

import pandas as pd

from snowflake.snowpark import DataFrame as SnowflakeDataFrame
from snowflake.snowpark.session import Session
from typing_extensions import Protocol

#: pylint: disable=invalid-name

ConfigValue = Union[str, bool, float, int]

class Config(Protocol):
    """Model configuration"""

    #: pylint: disable=too-few-public-methods

    def get(key: str, default: Optional[ConfigValue] = None) -> ConfigValue:
        """Get the value of a keky in the config dictionary"""

class This(Protocol):
    """Reference to this model's database table"""

    database: str
    schema: str
    identifier: str

class Dbt(Protocol):
    """DBT interface"""

    config: Config
    this: This
    is_incremental: bool

    def ref(self) -> SnowflakeDataFrame:
        """References to other models"""

    def source(self) -> SnowflakeDataFrame:
        """References to sources"""

def model(dbt: Dbt, session: Session) -> pd.DataFrame:
    """Build the Python model

    dbt : Dbt
        DBT object with configuration and references
    session : Session
        Snowpark session

        pandas DataFrame

    #: pylint: disable=unused-argument

    df = dbt.ref("another_model").to_pandas()

    return df

The return type of the model function can also be a Snowflake DataFrame, if you’ve not used to_pandas in your code.

1 Like