Hi all,
I’m working on a project and need some ideas/suggestions. First, I’m not a data engineer, but my background is primarily more of a business intelligence analyst.
The model I’m working on has a series of CSVs from a vendor with one file for each table. The data engineers made tables for each file type in Snowflake and dump the CSVs into a variant data type column. The vendor files only get generated if there is a change in that source table on their side, so some days there are no files at all (say Sundays), and some potential file types have no data. Some of the tables in Snowflake are blank as we have never (apparently) used that portion of the vendor’s program.
I found this: Writing packages when a source table may or may not exist , but my problem is that the table DOES exist, but with the wrong columns (because they are loaded as variant data type) but no data.
I’d like to build a dbt model on the data that is supposed to be there based on the vendor documentation in case there ever IS data that gets populated. However, current state I get an invalid identifier error because, correctly, the columns don’t exist. I’m looking for ideas/best practices on how to handle this. Do I leave the models and know they are going to error each run? Remove the models. but then how do I know if those tables DO get populated? Do something fancy with a macro looking to see if the table is populated? Something else I’m not thinking of? Thanks in advance for your suggestions.