Creating Custom Materialization

Developing a Custom Materialization Strategy

I’ve observed that, as of now, there are no existing materialization strategies that employ the TRUNCATE INSERT method with tables. Due to this gap, I am motivated to develop my own materialization strategy. Unfortunately, I’ve found that resources to guide me in this endeavour are somewhat scarce.

Code

{% materialization truncate_insert, adapter="default" %}

{% set target_table = this.schema ~ '.' ~ this.name %}

{{ run_hooks(pre_hooks, inside_transaction=False) }}
-- `BEGIN` happens here:
{{ run_hooks(pre_hooks, inside_transaction=True) }}

-- Truncate the target table
{% set truncate_sql %}
    TRUNCATE TABLE {{ target_table }}
{% endset %}

-- Insert the data into the target table
{% set insert_sql %}
    INSERT INTO {{ target_table }}
    {{ sql }}
{% endset %}

{% call statement('main') %}
  {{ truncate_sql }};
  {{ insert_sql }}
{% endcall %}

{{ run_hooks(post_hooks, inside_transaction=False) }}

-- `COMMIT` happens here
{{ adapter.commit() }}

{{return({'relations':[this]})}}

-- `BEGIN` happens here:
{{ run_hooks(post_hooks, inside_transaction=True) }}

{% endmaterialization %}

Challenges and Seeking Guidance

Although the code is operational and meets the intended objectives, it lacks certain elements, such as the tracking of the number of rows manipulated or performing adapter-specific actions. I’m eager to elevate my custom materialization to align with the “dbt” standard. Would anyone be able to recommend additional resources or guides to aid in bridging this gap?

Sample Output

15:35:14  Running with dbt=1.7.10
15:35:14  Registered adapter: postgres=1.7.10
15:35:14  Found 7 models, 2 tests, 1 source, 0 exposures, 0 metrics, 516 macros, 0 groups, 0 semantic models
15:35:14  
15:35:15  Concurrency: 1 threads (target='dev')
15:35:15  
15:35:15  1 of 7 START sql view model public.average_ride_per_month_raw .................. [RUN]
15:35:15  1 of 7 OK created sql view model public.average_ride_per_month_raw ............. [CREATE VIEW in 0.05s]
15:35:15  2 of 7 START sql view model public.filter_month_t .............................. [RUN]
15:35:15  2 of 7 OK created sql view model public.filter_month_t ......................... [CREATE VIEW in 0.03s]
15:35:15  3 of 7 START sql view model public.unfilter_months_t ........................... [RUN]
15:35:15  3 of 7 OK created sql view model public.unfilter_months_t ...................... [CREATE VIEW in 0.03s]
15:35:15  4 of 7 START sql truncate_insert model public.airport .......................... [RUN]
15:35:15  4 of 7 OK created sql truncate_insert model public.airport ..................... [INSERT 0 0 in 0.02s]
15:35:15  5 of 7 START sql view model public.average_ride_per_month ...................... [RUN]
15:35:15  5 of 7 OK created sql view model public.average_ride_per_month ................. [CREATE VIEW in 0.02s]
15:35:15  6 of 7 START sql truncate_insert model public.airport_morning .................. [RUN]
15:35:15  6 of 7 OK created sql truncate_insert model public.airport_morning ............. [INSERT 0 0 in 0.02s]
15:35:15  7 of 7 START sql truncate_insert model public.airport_unfiltered ............... [RUN]
15:35:15  7 of 7 OK created sql truncate_insert model public.airport_unfiltered .......... [INSERT 0 0 in 0.02s]
15:35:15  
15:35:15  Finished running 4 view models, 3 truncate_insert models in 0 hours 0 minutes and 0.27 seconds (0.27s).
15:35:15  
15:35:15  Completed successfully
15:35:15  
15:35:15  Done. PASS=7 WARN=0 ERROR=0 SKIP=0 TOTAL=7

References

Why not just use a pre-hook to truncate the table?

Like this:

{{ config(
    pre_hook="TRUNCATE TABLE " ~ this,
) }}

And if you are using Snowflake, something that is even easier is to use INSERT OVERWRITE (which truncates & inserts in one single command).

Would it overwrite materialisation strategy such as CREATE TABLE?

No, but you could use the incremental materialization to avoid re-creating the table.

It would “incrementally” insert into an empty table.