DBT pyspark gcp Pyspark process datafram but can't write to table on bigquery at the end

HI
I am using pyspark as a means to run custom code on dbt models. My tables are on bigquery and the spark cluster on gcp (dataproc).
When running the model it fails. Looking at the spark logs, the spark processing works fine, until it tries to write the table and fails. Sometimes 1/10 it succeeds (can find a pattern for this).

We are using :
dbt-bigquery 1.6.4
dbt-core 1.6.2
dbt-extractor 0.4.1

spark log

23/11/09 15:30:15 ERROR org.apache.spark.scheduler.TaskSetManager: task 0.0 in stage 3.0 (TID 3) had a not serializable result: com.google.cloud.spark.bigquery.repackaged.io.grpc.Status
Serialization stack:
	- object not serializable (class: com.google.cloud.spark.bigquery.repackaged.io.grpc.Status, value: Status{code=NOT_FOUND, description=Requested entity was not found. Entity: projects/X/datasets/ai_dw_mart_normal/tables/dummy_py_model/streams/CiQ3ZDg4ODc4Yy0wMDAwLTI4YTctYjg4Yi0yNDA1ODg3NzZiOTQ, cause=null})
	- writeObject data (class: java.lang.Throwable)
	- object (class com.google.cloud.spark.bigquery.repackaged.io.grpc.StatusRuntimeException, com.google.cloud.spark.bigquery.repackaged.io.grpc.StatusRuntimeException: NOT_FOUND: Requested entity was not found. Entity: projects/X/datasets/ai_dw_mart_normal/tables/dummy_py_model/streams/CiQ3ZDg4ODc4Yy0wMDAwLTI4YTctYjg4Yi0yNDA1ODg3NzZiOTQ)
	- writeObject data (class: java.lang.Throwable)
	- object (class com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.rpc.NotFoundException, com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.rpc.NotFoundException: com.google.cloud.spark.bigquery.repackaged.io.grpc.StatusRuntimeException: NOT_FOUND: Requested entity was not found. Entity: projects/X/datasets/ai_dw_mart_normal/tables/dummy_py_model/streams/CiQ3ZDg4ODc4Yy0wMDAwLTI4YTctYjg4Yi0yNDA1ODg3NzZiOTQ)
	- writeObject data (class: java.lang.Throwable)
	- object (class java.util.concurrent.ExecutionException, java.util.concurrent.ExecutionException: com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.rpc.NotFoundException: com.google.cloud.spark.bigquery.repackaged.io.grpc.StatusRuntimeException: NOT_FOUND: Requested entity was not found. Entity: projects/X/datasets/ai_dw_mart_normal/tables/dummy_py_model/streams/CiQ3ZDg4ODc4Yy0wMDAwLTI4YTctYjg4Yi0yNDA1ODg3NzZiOTQ)
	- writeObject data (class: java.lang.Throwable)
	- object (class com.google.cloud.bigquery.connector.common.BigQueryConnectorException, com.google.cloud.bigquery.connector.common.BigQueryConnectorException: Could not retrieve AppendRowsResponse)
	- field (class: com.google.cloud.spark.bigquery.write.DataSourceWriterContextPartitionHandler$1, name: val$e, type: class java.lang.Exception)
	- object (class com.google.cloud.spark.bigquery.write.DataSourceWriterContextPartitionHandler$1, com.google.cloud.spark.bigquery.write.DataSourceWriterContextPartitionHandler$1@527b3afb)
	- element of array (index: 0)
	- array (class [Ljava.lang.Object;, size 1); not retrying
23/11/09 15:30:15 ERROR org.apache.spark.scheduler.TaskSetManager: task 1.0 in stage 3.0 (TID 4) had a not serializable result: com.google.cloud.spark.bigquery.repackaged.io.grpc.Status
Serialization stack:
	- object not serializable (class: com.google.cloud.spark.bigquery.repackaged.io.grpc.Status, value: Status{code=NOT_FOUND, description=Requested entity was not found. Entity: projects/X/datasets/ai_dw_mart_normal/tables/dummy_py_model/streams/CiQ3ZDg4ODc3Zi0wMDAwLTI4YTctYjg4Yi0yNDA1ODg3NzZiOTQ, cause=null})
	- writeObject data (class: java.lang.Throwable)
	- object (class com.google.cloud.spark.bigquery.repackaged.io.grpc.StatusRuntimeException, com.google.cloud.spark.bigquery.repackaged.io.grpc.StatusRuntimeException: NOT_FOUND: Requested entity was not found. Entity: projects/X/datasets/ai_dw_mart_normal/tables/dummy_py_model/streams/CiQ3ZDg4ODc3Zi0wMDAwLTI4YTctYjg4Yi0yNDA1ODg3NzZiOTQ)
	- writeObject data (class: java.lang.Throwable)
	- object (class com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.rpc.NotFoundException, com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.rpc.NotFoundException: com.google.cloud.spark.bigquery.repackaged.io.grpc.StatusRuntimeException: NOT_FOUND: Requested entity was not found. Entity: projects/X/datasets/ai_dw_mart_normal/tables/dummy_py_model/streams/CiQ3ZDg4ODc3Zi0wMDAwLTI4YTctYjg4Yi0yNDA1ODg3NzZiOTQ)
	- writeObject data (class: java.lang.Throwable)
	- object (class java.util.concurrent.ExecutionException, java.util.concurrent.ExecutionException: com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.rpc.NotFoundException: com.google.cloud.spark.bigquery.repackaged.io.grpc.StatusRuntimeException: NOT_FOUND: Requested entity was not found. Entity: projects/X/datasets/ai_dw_mart_normal/tables/dummy_py_model/streams/CiQ3ZDg4ODc3Zi0wMDAwLTI4YTctYjg4Yi0yNDA1ODg3NzZiOTQ)
	- writeObject data (class: java.lang.Throwable)
	- object (class com.google.cloud.bigquery.connector.common.BigQueryConnectorException, com.google.cloud.bigquery.connector.common.BigQueryConnectorException: Could not retrieve AppendRowsResponse)
	- field (class: com.google.cloud.spark.bigquery.write.DataSourceWriterContextPartitionHandler$1, name: val$e, type: class java.lang.Exception)
	- object (class com.google.cloud.spark.bigquery.write.DataSourceWriterContextPartitionHandler$1, com.google.cloud.spark.bigquery.write.DataSourceWriterContextPartitionHandler$1@4b29b286)
	- element of array (index: 0)
	- array (class [Ljava.lang.Object;, size 1); not retrying
23/11/09 15:30:15 WARN com.google.cloud.spark.bigquery.write.context.BigQueryDirectDataSourceWriterContext: BigQuery Data Source writer 6cd8b42c-b849-41dc-a7cc-482ec82e7afe aborted
Traceback (most recent call last):
  File "/tmp/spark-af92d649-568e-4334-8535-abca0670a49b/dummy_py_model.py", line 196, in <module>
    df.write \
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 1109, in save
  File "/usr/lib/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1304, in __call__
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 111, in deco
  File "/usr/lib/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o79.save.
: com.google.cloud.bigquery.connector.common.BigQueryConnectorException: unexpected issue trying to save [pat_id: string, task_uuid: string ... 2 more fields]
	at com.google.cloud.spark.bigquery.write.BigQueryDataSourceWriterInsertableRelation.insert(BigQueryDataSourceWriterInsertableRelation.java:128)
	at com.google.cloud.spark.bigquery.write.CreatableRelationProviderHelper.createRelation(CreatableRelationProviderHelper.java:54)
	at com.google.cloud.spark.bigquery.v2.Spark31BigQueryTableProvider.createRelation(Spark31BigQueryTableProvider.java:65)
	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132)
	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131)
	at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
	at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:362)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: task 0.0 in stage 3.0 (TID 3) had a not serializable result: com.google.cloud.spark.bigquery.repackaged.io.grpc.Status
Serialization stack:
	- object not serializable (class: com.google.cloud.spark.bigquery.repackaged.io.grpc.Status, value: Status{code=NOT_FOUND, description=Requested entity was not found. Entity: projects/X/datasets/ai_dw_mart_normal/tables/dummy_py_model/streams/CiQ3ZDg4ODc4Yy0wMDAwLTI4YTctYjg4Yi0yNDA1ODg3NzZiOTQ, cause=null})
	- writeObject data (class: java.lang.Throwable)
	- object (class com.google.cloud.spark.bigquery.repackaged.io.grpc.StatusRuntimeException, com.google.cloud.spark.bigquery.repackaged.io.grpc.StatusRuntimeException: NOT_FOUND: Requested entity was not found. Entity: projects/X/datasets/ai_dw_mart_normal/tables/dummy_py_model/streams/CiQ3ZDg4ODc4Yy0wMDAwLTI4YTctYjg4Yi0yNDA1ODg3NzZiOTQ)
	- writeObject data (class: java.lang.Throwable)
	- object (class com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.rpc.NotFoundException, com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.rpc.NotFoundException: com.google.cloud.spark.bigquery.repackaged.io.grpc.StatusRuntimeException: NOT_FOUND: Requested entity was not found. Entity: projects/X/datasets/ai_dw_mart_normal/tables/dummy_py_model/streams/CiQ3ZDg4ODc4Yy0wMDAwLTI4YTctYjg4Yi0yNDA1ODg3NzZiOTQ)
	- writeObject data (class: java.lang.Throwable)
	- object (class java.util.concurrent.ExecutionException, java.util.concurrent.ExecutionException: com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.rpc.NotFoundException: com.google.cloud.spark.bigquery.repackaged.io.grpc.StatusRuntimeException: NOT_FOUND: Requested entity was not found. Entity: projects/X/datasets/ai_dw_mart_normal/tables/dummy_py_model/streams/CiQ3ZDg4ODc4Yy0wMDAwLTI4YTctYjg4Yi0yNDA1ODg3NzZiOTQ)
	- writeObject data (class: java.lang.Throwable)
	- object (class com.google.cloud.bigquery.connector.common.BigQueryConnectorException, com.google.cloud.bigquery.connector.common.BigQueryConnectorException: Could not retrieve AppendRowsResponse)
	- field (class: com.google.cloud.spark.bigquery.write.DataSourceWriterContextPartitionHandler$1, name: val$e, type: class java.lang.Exception)
	- object (class com.google.cloud.spark.bigquery.write.DataSourceWriterContextPartitionHandler$1, com.google.cloud.spark.bigquery.write.DataSourceWriterContextPartitionHandler$1@527b3afb)
	- element of array (index: 0)
	- array (class [Ljava.lang.Object;, size 1)
	at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2304)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2253)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2252)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2252)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1124)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1124)
	at scala.Option.foreach(Option.scala:407)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1124)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2491)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2433)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2422)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:902)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2204)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2225)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2244)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2269)
	at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1030)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:414)
	at org.apache.spark.rdd.RDD.collect(RDD.scala:1029)
	at org.apache.spark.api.java.JavaRDDLike.collect(JavaRDDLike.scala:362)
	at org.apache.spark.api.java.JavaRDDLike.collect$(JavaRDDLike.scala:361)
	at org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:45)
	at com.google.cloud.spark.bigquery.write.BigQueryDataSourceWriterInsertableRelation.insert(BigQueryDataSourceWriterInsertableRelation.java:89)
	... 34 more

Hello @ricardobarata

Did you find the error ?
I have the same issue.

Sylvain

@Tchinkatchuk @ricardobarata Hello, did you manage to solve this issue? I’m also facing the same errors