Help With Intermediate tmp Tables Created by DBT-Glue

The problem I’m having

The tmp table that is being written has null values, which get carried over to the final iceberg table.
Cannot determine other mis alignment issues resulting from this.

The context of why I’m trying to do this

I want to set up ELT that creates an iceberg lake and would lake to implement and ascertain data integrity.

What I’ve already tried

I am using dbt-GLLUE adapter following version:
dbt=1.7.4
dbt-glue=1.7.1

I am creating iceberg lake and have added the following configs to my dbt profiles:

“spark.sql.legacy.allowNonEmptyLocationInCTAS=true
–conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
–conf spark.serializer=org.apache.spark.serializer.KryoSerializer
–conf spark.sql.warehouse=
–conf spark.sql.catalog.glue_catalog=org.apache.iceberg.spark.SparkCatalog
–conf spark.sql.catalog.glue_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog
–conf spark.sql.catalog.glue_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO
–conf spark.sql.catalog.glue_catalog.lock-impl=org.apache.iceberg.aws.dynamodb.DynamoDbLockManager
–conf spark.sql.catalog.glue_catalog.lock.table=myGlueLockTable
–conf spark.sql.legacy.createTableColumnTypesInCatalog=true
–conf spark.sql.sources.default=org.apache.iceberg.spark.SparkCatalog
–conf spark.sql.sources.write.semantic=ORC
–conf spark.sql.legacy.createEmptyManagedTableByDefaul=true”

I went through the dbt logs and isolated the issue to, writing of the tmp table:

spark.sql(“CREATE TABLE test_table LOCATION ‘s3://BucketName/Iceberg_db/tmp_test_table’ AS SELECT * FROM tmp_tmp_test_table”)

When I inspect the table in Athena the input , output format and serde serialization lib assigned are wrong.
i.e:
Input Format: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde serialization lib: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

When I recreate the process in interactive notebook and override the above statement with:

spark.sql(“”"
CREATE TABLE tmp_test_table
ROW FORMAT SERDE ‘org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe’
WITH SERDEPROPERTIES (
‘serialization.format’ = ‘1’
)
STORED AS
INPUTFORMAT ‘org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat’
OUTPUTFORMAT ‘org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat’
LOCATION ‘s3://bucket_name/iceberg_db/tmp_test_table’
AS SELECT * FROM tmp_tmp_test_table
“”")

Correct values:
Input Format: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
Output Format: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
serde serialization lib: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe

The null issue is resolved. I was wondering there is way to override the intermediate tmp table creations or any other setting that will help set the correct input, output formats and serde serialization library?

After dredging to several possible solution I found the solution to this with the following Stack Overflow:

To implement the flag in dbt I added the set statement as prehook in dbt-project.yml
Optionally I also added the hive flag in my profiles.yml under the conf tags

SET hive.default.fileformat=parquet