Unable to load snowflake internal stage files to a table using dbt cloud

The problem I’m having

I have tried to load the data from the snowflake internal stage to a table which is already created in the datawarehouse ‘snowflake’.

The context of why I’m trying to do this

to load the data from the new updated stage file into a table

What I’ve already tried

tried to create a macro for retrieving the latest file from the stage files available

{% macro get_latest_file() %}
{% set query %}
WITH stage_files AS (
SELECT
METADATA$FILENAME AS file_name,
METADATA$FILE_LAST_MODIFIED AS last_modified,
ROW_NUMBER() OVER (ORDER BY METADATA$FILE_LAST_MODIFIED DESC) AS row_num
FROM @anbtx.anbtx_stage.stage_files
)
SELECT
file_name
FROM stage_files
WHERE row_num = 1;
{% endset %}

{% set result = run_query(query) %}
{% if result %}
{% set latest_file = result.columns[0].values()[0] %}
{% do return(latest_file) %}
{% else %}
{% do return(None) %}
{% endif %}
{% endmacro %}
and tried to call the macro in the sql file in order to copy the data from the file.

{% set latest_file = get_latest_file() %}

{% if latest_file %}
{% set copy_into_command %}
COPY INTO load_activeteller_accounttypes
FROM @anbtx.anbtx_stage.stage_files/{{ latest_file }}
FILE_FORMAT = (TYPE = ‘CSV’ FIELD_OPTIONALLY_ENCLOSED_BY=‘"’ SKIP_HEADER = 1)
ON_ERROR = ‘CONTINUE’;
{% endset %}

{{ run_query(copy_into_command) }}
{% else %}
{% do log(“No files found in the stage”, info=True) %}
{% endif %}

Some example code or error messages

[18:10] Jaya Shukla
11:56:19 Snowflake adapter: Snowflake query id: 01b6254c-0001-28f2-0004-902600017752

11:56:19 Snowflake adapter: Snowflake error: 100038 (22018): Numeric value 'ID' is not recognized

  File 'ACTIVE_TELLER_ACCOUNTTYPES_08_02_2024.csv', line 1, character 2

  Row 1, column "LOAD_ACTIVETELLER_ACCOUNTTYPES"["ID":1]

  If you would like to continue loading when an error is encountered, use other values such as 'SKIP_FILE' or 'CONTINUE' for the ON_ERROR option. For more information on loading options, please run 'info loading_data' in a SQL client.

11:56:19 Timing info for model.dbt_bitbucket.load_latest_file (compile): 11:56:17.326945 => 11:56:19.326565

11:56:19 On model.dbt_bitbucket.load_latest_file: Close

11:56:19 Database Error in model load_latest_file (models/load_latest_file.sql)

  100038 (22018): Numeric value 'ID' is not recognized

    File 'ACTIVE_TELLER_ACCOUNTTYPES_08_02_2024.csv', line 1, character 2

    Row 1, column "LOAD_ACTIVETELLER_ACCOUNTTYPES"["ID":1]

    If you would like to continue loading when an error is encountered, use other values such as 'SKIP_FILE' or 'CONTINUE' for the ON_ERROR option. For more information on loading options, please run 'info loading_data' in a SQL client.

11:56:19 1 of 1 ERROR creating sql view model edw__dev.LOAD.load_latest_file ............ [ERROR in 2.41s]

11:56:19 Finished running node model.dbt_bitbucket.load_latest_file
 

Hi @Mounika982 ,

Looking at the error message, I see two things that pop out:
11:56:19 Snowflake adapter: Snowflake error: 100038 (22018): Numeric value 'ID' is not recognized
and
File 'ACTIVE_TELLER_ACCOUNTTYPES_08_02_2024.csv', line 1, character 2 Row 1, column "LOAD_ACTIVETELLER_ACCOUNTTYPES"["ID":1]

It looks like the dbt macro is finding the right file (assuming you wanted the 08_08_2024 file!), and the macro is attempting the copy into command.
But, you have a column that is of type number and the file has data of text. (csv data is 100% text, always, you need to cast it in order to load it to a number column).

Instead of a macro, I suggest a custom materialization. Then instead of “calling a macro”, you create a staging model that sits right on top of the S3 data to cast the types. Then your existing model with the pre-hook becomes model 2 (if it is still needed).

Example materialization: dbt-demonstration/macros/copy_into.sql at cd337d4d526fa4cbfaffdcf3be69631624f6f0cd · jeff-skoldberg-gmds/dbt-demonstration · GitHub

Example usage: dbt-demonstration/macros/copy_into.sql at cd337d4d526fa4cbfaffdcf3be69631624f6f0cd · jeff-skoldberg-gmds/dbt-demonstration · GitHub

Hope that helps.

Hi @jeffsk,
I have tried the way, for which you have provided the link. But my client is very specific about using only copy into command how we use it in snowflake.

{% macro copy_files(stage_name, target_schema,table_name, pattern) %}
{% call statement(‘main’,auto_begin = false) %}
{% set adapter=‘snowflake’ %}
COPY INTO {{ target_schema }}.{{ table_name }}
FROM @{{ stage_name }}
PATTERN = ‘{{ pattern }}’
FILE_FORMAT = (TYPE = ‘CSV’, SKIP_HEADER = 1, error_on_column_count_mismatch=false)
FORCE = false;
{% endcall %}
{% endmacro %}

I’m calling this macro in the sql : :
{{
config(
materialized= ‘table’,
target_table = var(‘table_name_accounttypes’),
pre_hook= [
"{{ copy_files(
env_var(‘dbt_ENV_SRC_DB’) ~ ‘.’ ~ env_var(‘dbt_ENV_STAGE_SCH’) ~ ‘.’ ~ var(‘stage_name_active_teller’),
env_var(‘dbt_ENV_TGT_DB’) ~ ‘.’ ~ env_var(‘dbt_ENV_LOAD’),
var(‘table_name_accounttypes’),
‘.ACTIVE_TELLER_ACCOUNTTYPES_..csv’

        ) }}"
    ]

)
}}
select * from {{this }}

I think there’s some confusion. It is impossible to have dbt execute something that you couldn’t execute in the Snowsight UI. All dbt does is send valid SQL to the database to exectue. So, my code IS using 100% vanilla standard Snowflake functionality.

Copy into from select is covered here:
https://docs.snowflake.com/en/sql-reference/sql/copy-into-table#syntax

Using the materialization I gave (different model but same materialization), it looks like in the Snowflake query history when after it executes successfully:
image
(hard to get the whole thing in 1 screenshot, sorry)

So this should meet your requirement of using the copy into command the same way you use it in Snowflake.

Troubleshooting steps

  1. I would suggest the following troubleshooting steps:
    Create a file format object instead of defining it on the fly using type syntax.
    CREATE FILE FORMAT | Snowflake Documentation

  2. Get the copy command working in the Snowsight UI without using dbt.

  3. Only after it works in Snowsight UI, try to get the same code working in dbt.

  4. Any errors you get, use the Snowflake query history to find what the dbt created sql is different than what worked in step 2.

In summary - you are not having a dbt problem. The problem is the SQL that is being sent to the database is not working as it is trying to copy the text “ID” to a numeric field.

Since you set skip_header = 1 and it is still trying to copy the text “ID” into the numeric field (which I assume is actually the header), it is possible the csv is not in the shape you expect. This is why it is important to get the copy command working in Snowsight first, to eliminate the extra debugging of dbt jinja, etc.

Hi @jeffsk,
I have implemented the way as found in the snowflake documentation itself.
The macro created, contains the copy_into function as the way executed in the snowflake UI. The problem encountered here is it is copying the files from the internal stage. After the initial run, it is processing the files available in the snowflake internal stage but after that any new file is loaded into the stage file it is copying again all the processed files along with the new file.

Secondly, when added purge = true in the macro, it is copying the files and removing/deleting the files from the stage.
Please help me out in solving this.