Handling BigQuery Incremental __dbt_tmp tables

shaidi · March 29, 2023, 9:24am

Recently started using dbt, loving it so far, however I noticed something kind of weird with the incremental models.

I am using incremental models to create the “intermediate” level of my data, as the raw data is JSON based, I want to extract the keys that I need, and populate a table that will be relational.
To avoid going over the entire dataset every time, we decided to use Incremental models, with merge strategy, and a “on_schema_change=append_new_columns” approach.

Once we started using the models, we noticed that dbt creates “temporary” tables for the data that needs to be merged, with a lifetime of 12 hours. However, as we run our transformations about every 30 minutes, these tables are always present in our BigQuery project.

We do want to keep using the incremental strategy, and appending new columns, but according to this git issue, it seems that we might be able to resolve it if we change the on_schema_change to a different value.

Since the data is mostly filtered down and is not too large, we don’t mind about the storage costs - but we do mind that there are x2 more tables in our project, since for every incremental model we made, we now have the output table X and also the temporary table X__dbt_tmp

Is there a way we can get dbt to self-clean those temporary tables? It seems that it should’ve done so naturally, but it doesn’t.

Thanks!

shaidi · March 29, 2023, 11:14am

I managed to currently solve this issue by using a post-hook macro call for all the models that I want to include:

The macro:

{% macro incremental_tmp_table_dropper(bigQueryRelationObject) %}
    {% set tmpTableName %}
        {{ bigQueryRelationObject.database + '.' + bigQueryRelationObject.schema + '.' + bigQueryRelationObject.identifier + '__dbt_tmp'}}
    {% endset %}
    {% set query %}
        drop table if exists {{tmpTableName}};
    {% endset %}

    {{ return(query) }}
{% endmacro %}

Adding it in the dbt_project.yml config file:

        +post-hook:
          - "{{incremental_tmp_table_dropper(this)}}"

My current question:

Is there any risk in using this? can it somehow affect the incremental models? I tried to test & look in the documentation as much as I could, and I can’t see any reason not to use this implementation.

system · April 5, 2023, 11:14am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Handling Clean Up of dbt_tmp Tables. Show and Tell incremental , best-practice , bigquery	0	2642	September 29, 2023
Dbt over Bigquery Help	0	69	December 22, 2024
Pipelines failing with incremental models due to dbt temp tables left behind Help incremental	1	907	December 6, 2022
snapshots and incremental models implementation in dbt with Bigquery Help snapshots , incremental , bigquery	2	2288	February 13, 2023
Bigquery Cost for temporary table for incremental run In-Depth Discussions	0	113	October 8, 2024

Handling BigQuery Incremental __dbt_tmp tables

My current question:

Related topics