I have managed to get a python model working in my DAG with very simple operations (manually make a DataFrame in-code and return it, materializing it as a table).
Having proved the concept, I wanted to migrate a functioning jupyter notebook script into the model, replacing where my test script was. When I did, I ran into this error for multiple different operations:
TypeError: Cannot convert a Column object into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' if you're building DataFrame filter expressions. For example, use df.filter((col1 > 1) & (col2 > 2)) instead of df.filter(col1 > 1 and col2 > 2).
As far as I can tell, I am not comparing different types in an incompatible way whenever this error appears. For example, the first line at which it happens looks like this:
data_dbt['is_completed'] = np.where(data_dbt['status'] == 'completed', 1, 0)
status has contains strings and column
is_completed doesn’t exist before this line.
Has someone come across this error before?
We managed to solve it, so here’s the answer if anyone else has the same issue:
the solution is that
didn’t seem to return a DataFrame.
What worked for us was using to_pandas() to convert the data to a DataFrame that the rest of the code could recognise.
### Getting the data
data_dbt = dbt.ref("model")
# Convert to DF
# DF operation now works
# data['is_completed'] = np.where(data['status'] == 'completed', 1, 0)
Here is an explanation from @jerco (dbt slack source):
dbt.ref() does return a dataframe, but it will be the dataframe specific to your data warehouse — so a Snowpark dataframe on Snowpark, PySpark dataframe on Databricks or GCP (Dataproc). You’re right that you can convert from those native “distributed” dataframes to a Pandas dataframe, and then write transformations using the familiar Pandas API. Performance is a consideration as your data volume scales.There’s more about this in the docs: Python models | dbt Developer Hub
Great writeup, thank you for coming and closing the loop @Terroface!
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.