Full refresh incremental model on subset

Hi everyone !

I was wondering if there’s a way to perform a full refresh on a subset of an incremental model. For example based on a date column.

Delete from my_table where date_column > "yyyy-mm-dd" would be the logical choice here, but it is a restricted environnement, and I can only run dbt commands from the cluster’s CLI.

I also thought of using a pre_hook but that means I have to modify my model and open a PR which isn’t very convenient.

Do you guys have any idea how I could solve this issue or a workaround to suggest ?

Thank you in advance for your time !

It’s very normal to want to run these types of DML statements on the data warehouse from time to time for maintenance, schema evolution etc. Like 10 years ago this was a huge part of the day-to-day work of a data engineer.

So the logical solution is to get your data/analytics engineers the ability to run DML statements against the data warehouse (probably through JIT privilege escalation) and have a process to make sure they don’t screw anything up too much when they do :slightly_smiling_face:

Note: @Mike Stanley originally posted this reply in Slack. It might not have transferred perfectly.

But dbt doesn’t really have tools to help you with this unfortunately

Note: @Mike Stanley originally posted this reply in Slack. It might not have transferred perfectly.

I mangaged to find a workaround

  1. First create a macro :
{% macro delete_from_table(table_to_delete_from,date_column_name,delete_date_start) %}
{% set table_ref = ref(table_to_delete_from) %}
{% set sql %}


Delete from {{ table_ref }} where {{date_column_name}} > '{{delete_date_start}}'

{% endset %}

{% do run_query(sql) %}

{% endmacro %}
  1. Run the macro :
    dbt run-operation delete_from_table --args '{table_to_delete_from: my_table, date_column_name: updated_at, delete_date_start: 2024-05-29}'

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.