Best practice with multiple regions & external tables in BigQuery

simonl07 · July 8, 2024, 10:00pm

The problem I’m having

I’m looking for some advice on the best way to manage sources / staging models from multiple regions in BigQuery while the primary analytics models are in us-west1 in BigQuery

The context of why I’m trying to do this

I am currently running dbt core in Github Actions to do daily/hourly builds in BigQuery

Most raw data sources I am working with are loaded into us-west1 and the core data models are all built in a dataset in us-west1 for BI & rETL, however we started to have more sources from other regions in BigQuery, for now they are mostly due to:

Use of external tables & BigQuery Omni which maybe connect to blob storage from all other cloud/regions
Access to BigQuery datasets that sits in a different region in another GCP project

So essentially we have one main dbt project for one analytical BigQuery instance in a single analytical specific GCP project – but we are getting sources from other regions and BQ dataset in other projects that I’m hoping to manage all in one place, ideally through a single dbt project where we can see all the movements and lineage

What I’ve already tried

Today I have targets in profiles.yml that looks like below.

  prod:
    dataset: analytics
    location: us-west1
  dev:
    dataset: dbt_dev
    location: us-west1
  azure-eastus2:
    dataset: analytics_staging_us
    location: azure-eastus2

For scheduled prod builds, github action basically does below steps:

Run the dbt-external-table package to update any changes, this includes all external tables both from standard GCS blob + BigQuery Omni Connections
dbt run-operation stage_external_sources --target azure-eastus2
– An example of the external table source spec:

version: 2
sources:
  - name: azure_eastus2
    schema: azure_eastus2
    tables:
      - name: logs
        external:
          location: azure://..../*
          options:
            format: CSV
            skip_leading_rows: 1
            connection_name: <omni_connection_name>

Build models from each region into a single analytics_staging_us (multi-region US) using the dbt target configured for that specific region, the models here are basically doing few things incrementally 1) read from ext. table 2) do some basic agg & parsing 3) load into a physical table in US region
dbt build --target azure-eastus2 --select tag:azure-eastus2
The entire analytics_staging_us is setup with secondary replication to us-west1 in order to be accessible by core models, so we want to keep stuff here as aggregated as possible.
Run the main analytical build in us-west1 which reads from the physical table in analytics_staging_us as raw source
dbt build --target prod --select +tag:daily

So for Omni the dataflow goes:
Azure:eastus2 (or any other US region) —[incremental materialization from ext. tbl]— > GCP:US —[dataset replication]—> GCP:us-west1

The problem with this approach:

The dbt graphs are disconnected at the replication step, the materialized table from each region into analytics_staging_us are now raw dbt source in the main analytical models (instead of refs)
dbt docs generate will now error without --empty-catalog because it can only use a single target when compiling and scanning the catalog, we have some sources that are not reachable by a single target.

I think the fundamental limitation is that I can’t tell dbt to use multiple targets when doing something, or configure multiple session location when defining a target, but I’m not sure if I am missing anything in dbt that can handle this scenario gracefully

Topic		Replies	Views
Can you have multiple bigquery destinations in a single project? In-Depth Discussions bigquery	1	2598	July 27, 2023
How to join multiple BigQuery tables from different GCP projects Help best-practice	3	1471	November 24, 2022
Can we use multiple Bigquery datasets (or projects) in the same dbt project? Help	4	16592	November 23, 2020
DBT Cloud dev, stg, prod configurations Help best-practice , environments , bigquery , dbt-cloud , dbt_project-yml	0	180	October 30, 2024
Cross-Region Query Error in dbt Cloud with BigQuery In-Depth Discussions bigquery , dbt-cloud	0	45	June 4, 2025

Best practice with multiple regions & external tables in BigQuery

The problem I’m having

The context of why I’m trying to do this

What I’ve already tried

Related topics