I figured out why this is going wrong on recent versions of dbt. (For local testing I’ve been using v0.18.0
.)
Let’s talk about the differences between the two runs, both dbt run -m incremental_model --full-refresh
. All code below is based on the reproduction example you provided above.
First run
dbt runs the following SQL statements:
-- step 1: create
create table "jerco"."dbt_jcohen"."incremental_model"
as (
SELECT
id,
survey_type_id,
pulse_id
FROM "jerco"."dbt_jcohen"."survey"
);
-- step 2: post hooks
CREATE UNIQUE INDEX IF NOT EXISTS UI_incremental_modelid ON "jerco"."dbt_jcohen"."incremental_model" ("id");
CREATE INDEX IF NOT EXISTS IDXincremental_modelsurvey_type_id ON "jerco"."dbt_jcohen"."incremental_model" ("survey_type_id");
CREATE INDEX IF NOT EXISTS IDXincremental_model_pulse_id ON "jerco"."dbt_jcohen"."incremental_model" ("pulse_id");
Let’s check to make sure those indexes indeed exist:
select * from pg_indexes
where schemaname = 'dbt_jcohen'
and tablename = 'incremental_model'
schemaname |
tablename |
indexname |
tablespace |
indexdef |
dbt_jcohen |
incremental_model |
ui_incremental_modelid |
|
CREATE UNIQUE INDEX ui_incremental_modelid ON dbt_jcohen.incremental_model__dbt_backup USING btree (id) |
dbt_jcohen |
incremental_model |
idxincremental_modelsurvey_type_id |
|
CREATE INDEX idxincremental_modelsurvey_type_id ON dbt_jcohen.incremental_model__dbt_backup USING btree (survey_type_id) |
dbt_jcohen |
incremental_model |
idxincremental_model_pulse_id |
|
CREATE INDEX idxincremental_model_pulse_id ON dbt_jcohen.incremental_model__dbt_backup USING btree (pulse_id) |
All good!
Second run
This time, dbt_jcohen.incremental_model
already exists as a table in the database, so dbt will run a subtly different set of queries.
-- step 1: rename
alter table "jerco"."dbt_jcohen"."incremental_model" rename to "incremental_model__dbt_backup";
-- step 2: create
create table "jerco"."dbt_jcohen"."incremental_model"
as (
SELECT
id,
survey_type_id,
pulse_id
FROM "jerco"."dbt_jcohen"."survey"
);
-- step 3: post hooks
CREATE UNIQUE INDEX IF NOT EXISTS UI_incremental_modelid ON "jerco"."dbt_jcohen"."incremental_model" ("id");
CREATE INDEX IF NOT EXISTS IDXincremental_modelsurvey_type_id ON "jerco"."dbt_jcohen"."incremental_model" ("survey_type_id");
CREATE INDEX IF NOT EXISTS IDXincremental_model_pulse_id ON "jerco"."dbt_jcohen"."incremental_model" ("pulse_id");
-- step 4: cleanup
drop table if exists "jerco"."dbt_jcohen"."incremental_model__dbt_backup" cascade;
Now, if we run the same query as before (select * from pg_indexes ...
), we don’t get any results.
What happened?
Let’s think through this step by step:
- Via
alter table...
, dbt renames the “old” version of incremental_model
to avoid a namespace collision. If we “pause” here and check pg_indexes
, we’ll see that the three indices now reference incremental_model__dbt_backup
instead of incremental_model
.
- dbt creates the new version of
incremental_model
.
- dbt runs
create index if not exists
. Indices by those names do already exist, though they now reference incremental_model__dbt_backup
.
- dbt cascade-drops
incremental_model__dbt_backup
, which also drops the indices.
The upshot here is that, on the third run, dbt would recreate the indexes on the right table, since they no longer exist.
Solutions
- Instead of
create index if not exists
, you could run two statements: drop index if exists ...
+ create index
. A normal drop index
statement can be run inside a transaction (docs), so this is a pretty low-risk approach.
- You could give each index a name that will be unique for each run/invocation of dbt, i.e. by including a dbt-Jinja context variable such as
{{invocation_id}}
or {{run_started_at}}
in the index name construction. That way, the post-hooks should not find a same-named index already existing from a previous run.
I’m not sure why this “worked” on dbt v0.14.0. In any case, either of the approaches above should enable you to run this on a more recent version.
FYI
In the future, the best places for these bug reports are:
- stackoverflow: ask a question, tag it
dbt
- dbt repo: open an issue if you believe the cause to be error in the dbt codebase
If you put it in one of those places, you’re likelier to get an answer much sooner 