dbt model for currently blank source

Hi all,

I’m working on a project and need some ideas/suggestions. First, I’m not a data engineer, but my background is primarily more of a business intelligence analyst.

The model I’m working on has a series of CSVs from a vendor with one file for each table. The data engineers made tables for each file type in Snowflake and dump the CSVs into a variant data type column. The vendor files only get generated if there is a change in that source table on their side, so some days there are no files at all (say Sundays), and some potential file types have no data. Some of the tables in Snowflake are blank as we have never (apparently) used that portion of the vendor’s program.

I found this: Writing packages when a source table may or may not exist , but my problem is that the table DOES exist, but with the wrong columns (because they are loaded as variant data type) but no data.

I’d like to build a dbt model on the data that is supposed to be there based on the vendor documentation in case there ever IS data that gets populated. However, current state I get an invalid identifier error because, correctly, the columns don’t exist. I’m looking for ideas/best practices on how to handle this. Do I leave the models and know they are going to error each run? Remove the models. but then how do I know if those tables DO get populated? Do something fancy with a macro looking to see if the table is populated? Something else I’m not thinking of? Thanks in advance for your suggestions.

So I started going down the path of using Jinja to check if the row count is 0, if so then manually define the columns with nulls, if the row count is not 0 then run the variant flattening macro. Problem I’m having is evaluating that row count in an if statement - it’s always evaluating as false when that’s not the case. I’m doing something wrong in my syntax, but my jinja experience is about 0.

Here is what I have for a start:

{% set row_count_query %}
select
count(1)
from {{source(references here )}}
{% endset %}

{% set results = run_query(row_count_query) %}

{% if execute %}

{% if results == 0 %}
run query
{% elseif %}
run other query

{% endif %}

It compiles ok, but ALWAYS inserts the elseif query when it should be doing the first. Suggestions? Thanks.

Figured out my own question - I have to cast the results to an integer. So my statement reads:

{% if results|int == 0 %}

Then it operates as I would want