The problem I’m having
I created a model that unloads data from Snowflake to S3 and saves the unload times in an incremental model. It works just fine, however, I would like to take it to the next level and make it production-ready. I would be happy to do it using solely the Snowflake <> dbt eco-system but I’m not sure how to do it, therefore, I’m seeking some advice.
The context of why I’m trying to do this
As mentioned above, the objective is to store incremental records of successful unloads only.
I already got the model up and running, however, I would like to make it work as an “atomic unit”, which means, that if any part of the process fails, the entire procedure should halt without leaving any trace.
Here is what I have so far:
{{ config(
materialized='incremental',
pre_hook = "copy into @SOME_STAGE/path/ from (select * from {{ ref('my_source_model') }}"
) }}
select max(ts) as unload_ts from {{ ref('my_source_model') }}
where 1=1
{% if is_incremental() %}
and ts > (select max(ts) from {{ this }})
{% endif %}
(tz is the timestamp I’m basing the incremental upon and it’s the only column I would like to store in my new model)
As you can see, the model unloads data to an S3 bucket using a pre-hook
that runs a Snowflake COPY INTO
command.
The basic scenarios I would like to resolve:
- Suppose a Snowflake
copy into
s3://bucket/path
query is initiated using apre-hook
, but it fails. How can I prevent the model incremental build from continuing? - Suppose the pre-hook runs a Snowflake
copy into
s3://bucket/path
query successfully, but the actual incremental model build fails. How can I prevent the file from being stored in S3?
What I’ve already tried
I found this section, but I’m not really sure how to use it for my needs as for example in case 2 (mentioned above) I need to actually clean up the file(s) that the pre-hook
wrote to S3.
To sum-up
I would genuinely appreciate ideas for achieving the objective mentioned above and solving the scenarios I presented. Thank you in advance for your time and expertise.