Dynamically resolving file locations of parquet based sources

aras · April 28, 2023, 12:19pm

I’m using dbt-databricks and want to ingest some source tables in the from parquet files from our data lake. The tables are uploaded every day into a new location on the data lake.

Example of a parquet folder:

raw/<sourcename>/<tablename>/<yyyy>/<mm>/<dd>/tablename_yyyy_mm_dd.parquet

Our yaml source definition would look something like this:

- name: sourcename
    schema: parquet
    tables:
      - name: tablename
        identifier: /mnt/raw/sourcename/tablename/??/??/??/*.parquet
        quoting:
          identifier: true

The dbt source should always point to the latest folder of a given source table. Depending on the time of day the folder may not exist yet, and so the source should use the file from the previous day.

So basically I want to reference the source, but somehow modify the identifier to point to the correct parquet folder. Is this possible? I was thinking of writing a macro, but I am unsure of the best approach.

I am pretty sure I can write a macro that can resolve the correct folder path, but I am unsure of how I can pass that information along to the source.yml.

I was thinking of wrapping the normal source macro in a “source_parquet” before the normal source macro runs:

{% macro source_parquet(source, tbl) %}

   <Modify the identifier>  
  
  {{ source(source, tbl) }}
{% endmacro %}

I would then reference my new macro for parquet sources

with source_table as (
    
select * 
FROM {{ source_parquet('sourcename', 'tablename')}}

)

Is that feasible or is there a better approach?

Thanks!

Topic		Replies	Views
dbt with parquet files in azure data lake as source Help	1	1030	April 16, 2024
dbt doesn't seem to read my file from an external table Help bigquery , dbt-core	0	172	August 27, 2024
dbt pipeline with parquet files Help	0	5031	April 20, 2023
Dbt Model Not Picking Up Source Schema From source.yml File Help postgres , dbt-core	7	125	May 13, 2025
Release: dbt v0.13.0 Archive	0	2976	March 22, 2019

Dynamically resolving file locations of parquet based sources

Related topics