The problem I’m having
dbt run with full refresh is not rebuilding the table but replacing the existing table with data for the given partition. All the previous partitions are removed.
The context of why I’m trying to do this
I need to rebuild the table because two columns have been removed from the model. I can’t use on_schema_change: sync_all_columns
because databricks delta lake do not support column drop so in this particular use case I’ve to rebuild the table.
config in dbt_project.yml
Please see below the relevant config in dbt_project.yml file.
metrics_store:
+materialized: incremental
+incremental_strategy: insert_overwrite
+on_schema_change: append_new_columns
+partition_by:
- user_id
- report_date
dbt run command:
dbt run --select base_exp_valid_sessions --full-refresh --vars '{"report_date": "2024-07-18", "user_id": "123"}'
Logs
I would expect full-refresh to create a tmp table and then restore the data. In the logs, dbt is running create or replace table
command which doesn’t seem correct.
16:14:13 Databricks adapter: Cursor(session-id=d0918ca8-bd43-4894-818c-88f91a934ff6, command-id=Unknown) - Created cursor
16:14:13 SQL status: OK in 0.260 seconds
16:14:13 Databricks adapter: Cursor(session-id=d0918ca8-bd43-4894-818c-88f91a934ff6, command-id=37ff0877-ebc6-45e9-a1db-0b2ef32a0813) - Closing cursor
16:14:13 Writing runtime sql for node "model.metrics_store.base_exp_valid_sessions"
16:14:13 Databricks adapter: DatabricksDBTConnection(id=4933596368, session-id=d0918ca8-bd43-4894-818c-88f91a934ff6, name=model.metrics_store.base_exp_valid_sessions, idle-time=1.022150993347168s, acquire-count=1, language=sql, thread-identifier=(60701, 123145385476096), compute-name=) - Checking idleness
16:14:13 Databricks adapter: DatabricksDBTConnection(id=4933596368, session-id=d0918ca8-bd43-4894-818c-88f91a934ff6, name=model.metrics_store.base_exp_valid_sessions, idle-time=1.0227110385894775s, acquire-count=1, language=sql, thread-identifier=(60701, 123145385476096), compute-name=) - Retrieving connection
16:14:13 Using databricks connection "model.metrics_store.base_exp_valid_sessions"
16:14:13 On model.metrics_store.base_exp_valid_sessions: /* {"app": "dbt", "dbt_version": "1.8.4", "dbt_databricks_version": "1.8.4", "databricks_sql_connector_version": "3.1.2", "profile_name": "metrics_store", "target_name": "int", "node_id": "model.metrics_store.base_exp_valid_sessions"} */
create or replace table `falcon_int`.`bronze`.`base_exp_valid_sessions`
using delta
partitioned by (user_id,report_date)
as
with
filtered as (
select
*,
userid as user_id,
date('2024-07-18') as report_date
from `falcon_prd`.`gold`.`session_lookup`
where
(
(userid = '123')
and (
date(session_start) <= date('2024-07-18')
)
)
)
select *
from filtered
where
user_id = '123'
and report_date = date('2024-07-18')
16:14:13 Databricks adapter: Cursor(session-id=d0918ca8-bd43-4894-818c-88f91a934ff6, command-id=Unknown) - Created cursor