Handling BigQuery Incremental __dbt_tmp tables

Recently started using dbt, loving it so far, however I noticed something kind of weird with the incremental models.

I am using incremental models to create the “intermediate” level of my data, as the raw data is JSON based, I want to extract the keys that I need, and populate a table that will be relational.
To avoid going over the entire dataset every time, we decided to use Incremental models, with merge strategy, and a “on_schema_change=append_new_columns” approach.

Once we started using the models, we noticed that dbt creates “temporary” tables for the data that needs to be merged, with a lifetime of 12 hours. However, as we run our transformations about every 30 minutes, these tables are always present in our BigQuery project.

We do want to keep using the incremental strategy, and appending new columns, but according to this git issue, it seems that we might be able to resolve it if we change the on_schema_change to a different value.

Since the data is mostly filtered down and is not too large, we don’t mind about the storage costs - but we do mind that there are x2 more tables in our project, since for every incremental model we made, we now have the output table X and also the temporary table X__dbt_tmp

Is there a way we can get dbt to self-clean those temporary tables? It seems that it should’ve done so naturally, but it doesn’t.

Thanks!

I managed to currently solve this issue by using a post-hook macro call for all the models that I want to include:

The macro:

{% macro incremental_tmp_table_dropper(bigQueryRelationObject) %}
    {% set tmpTableName %}
        {{ bigQueryRelationObject.database + '.' + bigQueryRelationObject.schema + '.' + bigQueryRelationObject.identifier + '__dbt_tmp'}}
    {% endset %}
    {% set query %}
        drop table if exists {{tmpTableName}};
    {% endset %}

    {{ return(query) }}
{% endmacro %}

Adding it in the dbt_project.yml config file:

        +post-hook:
          - "{{incremental_tmp_table_dropper(this)}}"

My current question:

Is there any risk in using this? can it somehow affect the incremental models? I tried to test & look in the documentation as much as I could, and I can’t see any reason not to use this implementation.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.