Implementing script which worked in notebook causing a TypeError for several operations in dbt .py model

I have managed to get a python model working in my DAG with very simple operations (manually make a DataFrame in-code and return it, materializing it as a table).

Having proved the concept, I wanted to migrate a functioning jupyter notebook script into the model, replacing where my test script was. When I did, I ran into this error for multiple different operations:

TypeError: Cannot convert a Column object into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' if you're building DataFrame filter expressions. For example, use df.filter((col1 > 1) & (col2 > 2)) instead of df.filter(col1 > 1 and col2 > 2).

As far as I can tell, I am not comparing different types in an incompatible way whenever this error appears. For example, the first line at which it happens looks like this:

data_dbt['is_completed'] = np.where(data_dbt['status'] == 'completed', 1, 0)

Column status has contains strings and column is_completed doesnā€™t exist before this line.

Has someone come across this error before?

We managed to solve it, so hereā€™s the answer if anyone else has the same issue:

the solution is that dbt.ref()
didnā€™t seem to return a DataFrame.

What worked for us was using to_pandas() to convert the data to a DataFrame that the rest of the code could recognise.

### Getting the data
data_dbt = dbt.ref("model")

# Convert to DF
data = data_dbt.to _pandas()

# DF operation now works
# data['is_completed'] = np.where(data['status'] == 'completed', 1, 0)

Here is an explanation from @jerco (dbt slack source):

To clarify, dbt.ref() does return a dataframe, but it will be the dataframe specific to your data warehouse ā€” so a Snowpark dataframe on Snowpark, PySpark dataframe on Databricks or GCP (Dataproc). Youā€™re right that you can convert from those native ā€œdistributedā€ dataframes to a Pandas dataframe, and then write transformations using the familiar Pandas API. Performance is a consideration as your data volume scales.Thereā€™s more about this in the docs: Python models | dbt Developer Hub

1 Like

Great writeup, thank you for coming and closing the loop @Terroface!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.