The problem I’m having
When updating my python model to have more fields, PySpark job give warning that the number of fields is mismatched:
WARN BigQueryDataSourceWriterInsertableRelation: unexpected issue trying to save [col1: string, col2: timestamp … 12 more fields]
com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.BigQueryException: Inserted row has wrong column count; Has 14, expected 8 at [4:30]
The context of why I’m trying to do this
We have a python model that write to a Bigquery table
PySpark Job is submit to DataProc Serverless
Problem occur when we update the model to add new fields
What I’ve already tried
- Add properties allowFieldAddition in profiles.yml
runtime_config:
properties:
allowFieldAddition: 'true'
- Set spark config in python model
global spark
spark.conf.set("temporaryGcsBucket","temp_bucket")
spark.conf.set("allowFieldAddition","true")
Some example code or error messages
Caused by: com.google.cloud.spark.bigquery.repackaged.com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request
GET https://www.googleapis.com/bigquery/v2/projects/*******/queries/*******************************?location=**************&maxResults=0&prettyPrint=false
{
"code" : 400,
"errors" : [ {
"domain" : "global",
"location" : "q",
"locationType" : "parameter",
"message" : "Inserted row has wrong column count; Has 14, expected 8 at [4:30]",
"reason" : "invalidQuery"
} ],
"message" : "Inserted row has wrong column count; Has 14, expected 8 at [4:30]",
"status" : "INVALID_ARGUMENT"
}
at com.google.cloud.spark.bigquery.repackaged.com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
at com.google.cloud.spark.bigquery.repackaged.com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:118)
at com.google.cloud.spark.bigquery.repackaged.com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:37)
at com.google.cloud.spark.bigquery.repackaged.com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:439)
at com.google.cloud.spark.bigquery.repackaged.com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1111)
at com.google.cloud.spark.bigquery.repackaged.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:525)
at com.google.cloud.spark.bigquery.repackaged.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:466)
at com.google.cloud.spark.bigquery.repackaged.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:576)
at com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.getQueryResults(HttpBigQueryRpc.java:692)
... 60 more
23/08/08 05:08:26 WARN BigQueryDirectDataSourceWriterContext: BigQuery Data Source writer c0f75ced-4543-4722-b974-0be9bceecc4a aborted