How to make good sense of how to connect to and reference dbt sources, for example referencing 'dbt-tutorial' following dbt BigQuery set up guidelines?


A major deficiency for dbt is managing data sources, starting with connecting to and referencing data sources.

To set up a BigQuery source for dbt, the guide Quickstart for dbt Cloud and BigQuery | dbt Developer Hub covers the following:


Create BigQuery datasets

  1. Verify that you can run SQL queries. Copy and paste these queries into the Query Editor:
select * from `dbt-tutorial.jaffle_shop.customers`;select * from `dbt-tutorial.jaffle_shop.orders`;select * from `dbt-tutorial.stripe.payment`;

However, searching BigQuery Explorer for ‘dbt-tutorial’ finds 0 results. Where should it be found?


Recommended dbt sql code includes 'select * from dbt-tutorial.jaffle_shop.customers

However, where is naming reference for ‘dbt-tutorial’ found?

Are you querying the US region’s public datasets? I think that that’s the only region where the sample data is available, as GCP doesn’t allow cross-region queries of public datasets.

I’m querying how to reference what data warehouse sources, rather than referencing an mysterious ‘dbt-tutorial’?

Where in dbt or BigQuery is a standalone reference to dbt-tutorial or as a referenceable link between dbt and BigQuery?

How to follow/mimic the dbt-tutorial teaching example for connecting to other project datasets?

Referencing a public project ‘dbt-tutorial’ as part of a dbt fundamentals course and/or dbt documentation, purportedly teaching the fundamentals of how to connect to datasets is inappropriate. It evades teaching the learner how to reference data warehouse sources for any projects they might work on. Learners’ projects will not be public projects.

The dbt fundamentals course specifies dbt-tutorial as the database reference as follows:

sources.yml (inside models)

version: 2

  - name: jaffle_shop
    database: dbt-tutorial  
    schema: jaffle_shop  
      - name: orders
      - name: customers

Quickstart for dbt Cloud and BigQuery | dbt Developer Hub refers to dbt-tutorial a number of times, without specifying where dbt-turorial comes from, where it is specified as a dataset link between dbt cloud and BigQuery.

This is inaccurate - the process described for generating BQ credentials and connecting dbt Cloud to BQ are consistent with the steps necessary to run dbt in production.

The value provided by the dbt-tutorial dataset is that you don’t need to spend time working out how to load data into your warehouse – a job for which dbt is not responsible, as the T in ELT – and can instead jump ahead to transforming the data. In other warehouses which don’t have the concept of public datasets, we instead need to guide new dbt users through uploading CSV files into their warehouse. That process is a less accurate representation of the analytics engineering workflow than using a public dataset, but is an effective workaround.

BigQuery abstracts away the difference between public datasets and data in your own BQ account, so when you progress from transforming sample data to transforming your company’s data, you will configure sources in the exact same way: define the database, schema and tables in a YAML file.

In the introduction area, we specify that you will be accessing sample data in a public dataset, although we do not explicitly state that the dataset is called dbt-tutorial.

If you think that there are specific clarifications we can make in the documentation, you are welcome to suggest changes by clicking the Edit this Page button underneath the table of contents.

How to reference a BigQuery source:

  • as taught in the dbt Fundamentals course,
  • as instructed by dbt documentation,
  • that can be followed for referencing a source dataset that is not the public project ‘dbt-turorial’ dataset?

On cloud dbt, how to find where ‘dbt-tutorial’ is specified and configured as a dbt source dataset?

Where is the dbt-tutorial dataset located on BigQuery?

Joel, unless you will constructively reply on how to reference a dbt BigQuery source dataset (or reply citing the dbt preferred alternative(s) your user experience sabotaging replies are implicitly promoting) - please leave this post for someone more inclined to reply constructively regarding how to reference source datasets, rather than falsify and mislead to evade providing relevant support.

A dbt fundamentals course and documentation that purports to instruct on how to connect and reference source datasets, but instead:

  • instructs how to reference a public project ‘dbt-tutorial’,
  • without any visibility of how that reference applies,
  • without instructing how to reference any other source dataset

is incomplete and inadequate.

A dbt staffer denying that fact to evade providing support on how to connect and reference source datasets indicate an unfortunate dbt culture, not one that supports stakeholders, continuous product improvement, and adoption.

A post was split to a new topic: dbt Fundamentals project table unavailable