I have managed to get a python model working in my DAG with very simple operations (manually make a DataFrame in-code and return it, materializing it as a table).
Having proved the concept, I wanted to migrate a functioning jupyter notebook script into the model, replacing where my test script was. When I did, I ran into this error for multiple different operations:
TypeError: Cannot convert a Column object into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' if you're building DataFrame filter expressions. For example, use df.filter((col1 > 1) & (col2 > 2)) instead of df.filter(col1 > 1 and col2 > 2).
As far as I can tell, I am not comparing different types in an incompatible way whenever this error appears. For example, the first line at which it happens looks like this:
To clarify, dbt.ref() does return a dataframe, but it will be the dataframe specific to your data warehouse — so a Snowpark dataframe on Snowpark, PySpark dataframe on Databricks or GCP (Dataproc). You’re right that you can convert from those native “distributed” dataframes to a Pandas dataframe, and then write transformations using the familiar Pandas API. Performance is a consideration as your data volume scales.There’s more about this in the docs: Python models | dbt Developer Hub