Implementing script which worked in notebook causing a TypeError for several operations in dbt .py model

Terroface · January 13, 2023, 11:47am

I have managed to get a python model working in my DAG with very simple operations (manually make a DataFrame in-code and return it, materializing it as a table).

Having proved the concept, I wanted to migrate a functioning jupyter notebook script into the model, replacing where my test script was. When I did, I ran into this error for multiple different operations:

TypeError: Cannot convert a Column object into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' if you're building DataFrame filter expressions. For example, use df.filter((col1 > 1) & (col2 > 2)) instead of df.filter(col1 > 1 and col2 > 2).

As far as I can tell, I am not comparing different types in an incompatible way whenever this error appears. For example, the first line at which it happens looks like this:

data_dbt['is_completed'] = np.where(data_dbt['status'] == 'completed', 1, 0)

Column status has contains strings and column is_completed doesn’t exist before this line.

Has someone come across this error before?

Terroface · January 16, 2023, 1:26pm

We managed to solve it, so here’s the answer if anyone else has the same issue:

the solution is that dbt.ref()
didn’t seem to return a DataFrame.

What worked for us was using to_pandas() to convert the data to a DataFrame that the rest of the code could recognise.

### Getting the data
data_dbt = dbt.ref("model")

# Convert to DF
data = data_dbt.to _pandas()

# DF operation now works
# data['is_completed'] = np.where(data['status'] == 'completed', 1, 0)

Here is an explanation from @jerco (dbt slack source):

To clarify, dbt.ref() does return a dataframe, but it will be the dataframe specific to your data warehouse — so a Snowpark dataframe on Snowpark, PySpark dataframe on Databricks or GCP (Dataproc). You’re right that you can convert from those native “distributed” dataframes to a Pandas dataframe, and then write transformations using the familiar Pandas API. Performance is a consideration as your data volume scales.There’s more about this in the docs: Python models | dbt Developer Hub

joellabes · January 18, 2023, 11:55pm

Great writeup, thank you for coming and closing the loop @Terroface!

system · January 25, 2023, 11:55pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Python model running in Snowflake and IDE but not as dbtCloud job Help python-models , dbt-cloud	2	1488	August 24, 2023
dbt Python model (dbt-py) best practices In-Depth Discussions best-practice , python-models	1	14392	January 19, 2023
dbt python dynamic passing of table not working Help snowflake , python-models , dbt-core	0	628	March 22, 2024
Error when running models from dbt core CLI Help dbt-core	0	1092	August 28, 2023
While building dbt-core- sql-python pipe with following code in .py file , getting error as - "dbt allows exactly one model defined per python file, found 0" Help dbt-core	0	594	February 5, 2024

Implementing script which worked in notebook causing a TypeError for several operations in dbt .py model

Related topics