I have tried to load the data from the snowflake internal stage to a table which is already created in the datawarehouse ‘snowflake’.
The context of why I’m trying to do this
to load the data from the new updated stage file into a table
What I’ve already tried
tried to create a macro for retrieving the latest file from the stage files available
{% macro get_latest_file() %}
{% set query %}
WITH stage_files AS (
SELECT
METADATA$FILENAME AS file_name,
METADATA$FILE_LAST_MODIFIED AS last_modified,
ROW_NUMBER() OVER (ORDER BY METADATA$FILE_LAST_MODIFIED DESC) AS row_num
FROM @anbtx.anbtx_stage.stage_files
)
SELECT
file_name
FROM stage_files
WHERE row_num = 1;
{% endset %}
{% set result = run_query(query) %}
{% if result %}
{% set latest_file = result.columns[0].values()[0] %}
{% do return(latest_file) %}
{% else %}
{% do return(None) %}
{% endif %}
{% endmacro %}
and tried to call the macro in the sql file in order to copy the data from the file.
{% set latest_file = get_latest_file() %}
{% if latest_file %}
{% set copy_into_command %}
COPY INTO load_activeteller_accounttypes
FROM @anbtx.anbtx_stage.stage_files/{{ latest_file }}
FILE_FORMAT = (TYPE = ‘CSV’ FIELD_OPTIONALLY_ENCLOSED_BY=‘"’ SKIP_HEADER = 1)
ON_ERROR = ‘CONTINUE’;
{% endset %}
{{ run_query(copy_into_command) }}
{% else %}
{% do log(“No files found in the stage”, info=True) %}
{% endif %}
Some example code or error messages
[18:10] Jaya Shukla
11:56:19 Snowflake adapter: Snowflake query id: 01b6254c-0001-28f2-0004-902600017752
11:56:19 Snowflake adapter: Snowflake error: 100038 (22018): Numeric value 'ID' is not recognized
File 'ACTIVE_TELLER_ACCOUNTTYPES_08_02_2024.csv', line 1, character 2
Row 1, column "LOAD_ACTIVETELLER_ACCOUNTTYPES"["ID":1]
If you would like to continue loading when an error is encountered, use other values such as 'SKIP_FILE' or 'CONTINUE' for the ON_ERROR option. For more information on loading options, please run 'info loading_data' in a SQL client.
11:56:19 Timing info for model.dbt_bitbucket.load_latest_file (compile): 11:56:17.326945 => 11:56:19.326565
11:56:19 On model.dbt_bitbucket.load_latest_file: Close
11:56:19 Database Error in model load_latest_file (models/load_latest_file.sql)
100038 (22018): Numeric value 'ID' is not recognized
File 'ACTIVE_TELLER_ACCOUNTTYPES_08_02_2024.csv', line 1, character 2
Row 1, column "LOAD_ACTIVETELLER_ACCOUNTTYPES"["ID":1]
If you would like to continue loading when an error is encountered, use other values such as 'SKIP_FILE' or 'CONTINUE' for the ON_ERROR option. For more information on loading options, please run 'info loading_data' in a SQL client.
11:56:19 1 of 1 ERROR creating sql view model edw__dev.LOAD.load_latest_file ............ [ERROR in 2.41s]
11:56:19 Finished running node model.dbt_bitbucket.load_latest_file
Looking at the error message, I see two things that pop out: 11:56:19 Snowflake adapter: Snowflake error: 100038 (22018): Numeric value 'ID' is not recognized
and File 'ACTIVE_TELLER_ACCOUNTTYPES_08_02_2024.csv', line 1, character 2 Row 1, column "LOAD_ACTIVETELLER_ACCOUNTTYPES"["ID":1]
It looks like the dbt macro is finding the right file (assuming you wanted the 08_08_2024 file!), and the macro is attempting the copy into command.
But, you have a column that is of type number and the file has data of text. (csv data is 100% text, always, you need to cast it in order to load it to a number column).
Instead of a macro, I suggest a custom materialization. Then instead of “calling a macro”, you create a staging model that sits right on top of the S3 data to cast the types. Then your existing model with the pre-hook becomes model 2 (if it is still needed).
Hi @jeffsk,
I have tried the way, for which you have provided the link. But my client is very specific about using only copy into command how we use it in snowflake.
I think there’s some confusion. It is impossible to have dbt execute something that you couldn’t execute in the Snowsight UI. All dbt does is send valid SQL to the database to exectue. So, my code IS using 100% vanilla standard Snowflake functionality.
Using the materialization I gave (different model but same materialization), it looks like in the Snowflake query history when after it executes successfully:
(hard to get the whole thing in 1 screenshot, sorry)
So this should meet your requirement of using the copy into command the same way you use it in Snowflake.
Troubleshooting steps
I would suggest the following troubleshooting steps:
Create a file format object instead of defining it on the fly using type syntax. CREATE FILE FORMAT | Snowflake Documentation
Get the copy command working in the Snowsight UI without using dbt.
Only after it works in Snowsight UI, try to get the same code working in dbt.
Any errors you get, use the Snowflake query history to find what the dbt created sql is different than what worked in step 2.
In summary - you are not having a dbt problem. The problem is the SQL that is being sent to the database is not working as it is trying to copy the text “ID” to a numeric field.
Since you set skip_header = 1 and it is still trying to copy the text “ID” into the numeric field (which I assume is actually the header), it is possible the csv is not in the shape you expect. This is why it is important to get the copy command working in Snowsight first, to eliminate the extra debugging of dbt jinja, etc.
Hi @jeffsk,
I have implemented the way as found in the snowflake documentation itself.
The macro created, contains the copy_into function as the way executed in the snowflake UI. The problem encountered here is it is copying the files from the internal stage. After the initial run, it is processing the files available in the snowflake internal stage but after that any new file is loaded into the stage file it is copying again all the processed files along with the new file.
Secondly, when added purge = true in the macro, it is copying the files and removing/deleting the files from the stage.
Please help me out in solving this.