What documentation summarizes folder and naming conventions, for deciphering their usage in the dbt fundamentals course?

graeme · April 3, 2023, 9:26am

After completing the dbt fundamentals course and browsing dbt reference documentation, I’m still unable to decipher the basic working practice/meaning of dbt folder structure and file naming.

The course works through a number of scenarios, both implying and conflating how files should be named and where they should be located, but failing to clearly specify how they should be named and located, their sequencing of processing, etc. Points 3 to 5 below file naming are demonstrated as part of the dbt fundamentals course segments but their relevance is not specified, so consequently, their significance are fuzzy and/or conflated.

Some related queries are:

What is the order of file processing related to file names and folder placement?
What is the relevance of what file naming and folder placement related to file processing sequencing and layering OR does somehow the internal code irrespective of file naming and folder location determine layering, processing, etc.?
What (if any) meaning does the ‘stg_’ file name prefix and/or models staging folder have? If no meaning, how is staging code recognized and sequenced as staging code?
What (if any) meaning does the ‘fct_’ file name prefix and/or models staging folder have? If no meaning, how is staging code recognized and sequenced as staging code?
What (if any) meaning does the ‘dim_’ file name prefix and/or models staging folder have? If no meaning, how is staging code recognized and sequenced as staging code?

brunoszdl · April 3, 2023, 11:41am

I am not sure if I understood your question, so let me know if this is not what you are asking.

When you execute a command such as dbt run, the models will be run in the DAGs order, and dbt knows this order because of the {{ ref() }} and {{ source() }} jinja functions. The name of the files and folders will not impact the order of processing, the resources are related through ref and source. (you other particularities with tests and exposures)

The name and placing conventions are used to make your project more understandable and consistent. And also to allow you to easily select a group of models in your command using selector syntax.

As such, it’s especially important to establish a deep and broad set of patterns to ensure as many people as possible are empowered to leverage their particular expertise in a positive way, and to ensure that the project remains approachable and maintainable as your organization scales.
How we structure our dbt projects | dbt Developer Hub

Folders. Folder structure is extremely important in dbt. Not only do we need a consistent structure to find our way around the codebase, as with any software project, but our folder structure is also one of the key interfaces for understanding the knowledge graph encoded in our project (alongside the DAG and the data output into our warehouse). It should reflect how the data flows, step-by-step, from a wide variety of source-conformed models into fewer, richer business-conformed models. Moreover, we can use our folder structure as a means of selection in dbt selector syntax.
Staging: Preparing our atomic building blocks | dbt Developer Hub

File names. Creating a consistent pattern of file naming is crucial in dbt. File names must be unique and correspond to the name of the model when selected and created in the warehouse. We recommend putting as much clear information into the file name as possible, including a prefix for the layer the model exists in, important grouping information, and specific information about the entity or transformation in the model.
Staging: Preparing our atomic building blocks | dbt Developer Hub

You can find more information in the links in the citations.

graeme · April 4, 2023, 12:10pm

Thanks.

Your explicit reply regarding

the order of processing with regard to the {{ ref() }} and {{ source() }} jinja functions,
the impact of folder structuring and file naming,
is helpful.

I agree that logical folder structuring and naming conventions are important for good project management. Also important is distinguishing between how data/files are processed from logical file organization, otherwise, an important fundamental understanding of dbt is missing.

For further clarification, and to confirm your reply and citation links:

A.
i. What about the folder structure at the level of models and tests?
Do those folder naming conventions impact processing?
ii. If I created and used a folder named ‘design’ instead of ‘models’, would processing still be the same?
Is that determined by the ‘model-paths: [“models”]’ setting in dbt_project.yml?
iii. If I included staging files in the tests folder, would they still process properly for staging as long as the appropriate {{ ref() }} and {{ source() }} jinja functions were included?
Is that related to the ‘test-paths: [“tests”]’ setting in dbt_project.yml?

B.
i. Would naming a folder in models to ‘stg’ instead of ‘staging’ and prefixing the files inside with ‘stage_’ instead of ‘stg_’ work the same as long as the {{ ref() }} and {{ source() }} jinja functions are appropriately the same?

C.
How to decipher the reference to ‘dbt-tutorial’ from “dbt-tutorial.jaffle_shop.customers” from Quickstart for dbt Cloud and BigQuery | dbt Developer Hub?

Where is ‘dbt-tutorial’ specified as a data warehouse and/or source in dbt or BigQuery?

brunoszdl · April 4, 2023, 12:42pm

A.

dbt will try to find each type of resource in their respective folders, according to the paths you specify in dbt_project.yml. If you do not specify anything it will search for the default ones.

Your models must be inside the models folder unless you specify another folder in model-paths as you said. If you want them to be inside a design folder you can specify it in the paths.

I have never tried to pass the tests folder in the model-paths and put a model inside it. It might work, not sure. For learning purposes, you can try it, just have in mind that it was not how it was supposed to be structured.

model-paths | dbt Developer Hub.

B.

No problem doing that. You can name the subfolders inside your models folder, and the models inside as you want. It will work the same.

C.

dbt-tutorial is a public project in BigQuery, you can define it as a source in dbt as

sources.yml (inside models)

version: 2

sources:
  - name: jaffle_shop
    database: dbt-tutorial  
    schema: jaffle_shop  
    tables:
      - name: orders
      - name: customers
...

And call from your model as

select *
from {{ source('jaffle_shop', 'orders') }}

I didn’t see the tutorial, so I am not sure if you need to do it, but you can.

graeme · April 4, 2023, 11:45pm

How to reference what data warehouse sources, rather than referencing ‘dbt-tutorial’?

Referencing a public project ‘dbt-tutorial’ as part of a dbt fundamentals course and/or dbt documentation, purportedly teaching the fundamentals of how to connect to datasets is inappropriate. It evades teaching the learner how to reference data warehouse sources for any projects they might work on. Learners’ projects will not be public projects.

How to follow/mimic the dbt-tutorial teaching example for connecting to other project datasets?

Where in dbt or BigQuery is a standalone reference to dbt-tutorial or as a referenceable link between dbt and BigQuery?

Topic		Replies	Views
Using folders inside models Archive	2	11785	May 22, 2018
How we (used to) structure our dbt projects Archive	22	196575	September 7, 2022
Your dbt Project Checklist Archive	3	19560	February 3, 2023
Best practice splitting SQL into staging, intermediate and marts layers and naming conventions Help	1	5836	January 3, 2024
Is it possible to have multiple files with the same name in dbt? Help alias	2	16023	October 17, 2019

What documentation summarizes folder and naming conventions, for deciphering their usage in the dbt fundamentals course?

Related topics