dbt model for currently blank source

ponika · July 19, 2023, 5:19pm

Hi all,

I’m working on a project and need some ideas/suggestions. First, I’m not a data engineer, but my background is primarily more of a business intelligence analyst.

The model I’m working on has a series of CSVs from a vendor with one file for each table. The data engineers made tables for each file type in Snowflake and dump the CSVs into a variant data type column. The vendor files only get generated if there is a change in that source table on their side, so some days there are no files at all (say Sundays), and some potential file types have no data. Some of the tables in Snowflake are blank as we have never (apparently) used that portion of the vendor’s program.

I found this: Writing packages when a source table may or may not exist , but my problem is that the table DOES exist, but with the wrong columns (because they are loaded as variant data type) but no data.

I’d like to build a dbt model on the data that is supposed to be there based on the vendor documentation in case there ever IS data that gets populated. However, current state I get an invalid identifier error because, correctly, the columns don’t exist. I’m looking for ideas/best practices on how to handle this. Do I leave the models and know they are going to error each run? Remove the models. but then how do I know if those tables DO get populated? Do something fancy with a macro looking to see if the table is populated? Something else I’m not thinking of? Thanks in advance for your suggestions.

ponika · July 21, 2023, 6:07pm

So I started going down the path of using Jinja to check if the row count is 0, if so then manually define the columns with nulls, if the row count is not 0 then run the variant flattening macro. Problem I’m having is evaluating that row count in an if statement - it’s always evaluating as false when that’s not the case. I’m doing something wrong in my syntax, but my jinja experience is about 0.

Here is what I have for a start:

{% set row_count_query %}
select
count(1)
from {{source(references here )}}
{% endset %}

{% set results = run_query(row_count_query) %}

{% if execute %}

{% if results == 0 %}
run query
{% elseif %}
run other query

{% endif %}

It compiles ok, but ALWAYS inserts the elseif query when it should be doing the first. Suggestions? Thanks.

ponika · July 21, 2023, 6:50pm

Figured out my own question - I have to cast the results to an integer. So my statement reads:

{% if results|int == 0 %}

Then it operates as I would want

Topic		Replies	Views
dbt jinja for column missing Help	1	203	July 15, 2024
dbt jinja conditional logic Help	0	615	July 4, 2024
Testing source data with user defined tests/macros Help testing	2	3894	February 7, 2025
Custom dbt test is failing Help testing , macros , dbt-core	1	743	June 28, 2024
dbt model cannot be refered in the same model Help	4	3627	April 26, 2023

dbt model for currently blank source

Related topics