Best practice with multiple regions & external tables in BigQuery

The problem I’m having

I’m looking for some advice on the best way to manage sources / staging models from multiple regions in BigQuery while the primary analytics models are in us-west1 in BigQuery

The context of why I’m trying to do this

I am currently running dbt core in Github Actions to do daily/hourly builds in BigQuery

Most raw data sources I am working with are loaded into us-west1 and the core data models are all built in a dataset in us-west1 for BI & rETL, however we started to have more sources from other regions in BigQuery, for now they are mostly due to:

  1. Use of external tables & BigQuery Omni which maybe connect to blob storage from all other cloud/regions
  2. Access to BigQuery datasets that sits in a different region in another GCP project

So essentially we have one main dbt project for one analytical BigQuery instance in a single analytical specific GCP project – but we are getting sources from other regions and BQ dataset in other projects that I’m hoping to manage all in one place, ideally through a single dbt project where we can see all the movements and lineage

What I’ve already tried

Today I have targets in profiles.yml that looks like below.

  prod:
    dataset: analytics
    location: us-west1
  dev:
    dataset: dbt_dev
    location: us-west1
  azure-eastus2:
    dataset: analytics_staging_us
    location: azure-eastus2

For scheduled prod builds, github action basically does below steps:

  1. Run the dbt-external-table package to update any changes, this includes all external tables both from standard GCS blob + BigQuery Omni Connections
    dbt run-operation stage_external_sources --target azure-eastus2
    – An example of the external table source spec:
version: 2
sources:
  - name: azure_eastus2
    schema: azure_eastus2
    tables:
      - name: logs
        external:
          location: azure://..../*
          options:
            format: CSV
            skip_leading_rows: 1
            connection_name: <omni_connection_name>
  1. Build models from each region into a single analytics_staging_us (multi-region US) using the dbt target configured for that specific region, the models here are basically doing few things incrementally 1) read from ext. table 2) do some basic agg & parsing 3) load into a physical table in US region
    dbt build --target azure-eastus2 --select tag:azure-eastus2
    The entire analytics_staging_us is setup with secondary replication to us-west1 in order to be accessible by core models, so we want to keep stuff here as aggregated as possible.
  2. Run the main analytical build in us-west1 which reads from the physical table in analytics_staging_us as raw source
    dbt build --target prod --select +tag:daily

So for Omni the dataflow goes:
Azure:eastus2 (or any other US region) —[incremental materialization from ext. tbl]— > GCP:US —[dataset replication]—> GCP:us-west1

The problem with this approach:

  1. The dbt graphs are disconnected at the replication step, the materialized table from each region into analytics_staging_us are now raw dbt source in the main analytical models (instead of refs)
  2. dbt docs generate will now error without --empty-catalog because it can only use a single target when compiling and scanning the catalog, we have some sources that are not reachable by a single target.

I think the fundamental limitation is that I can’t tell dbt to use multiple targets when doing something, or configure multiple session location when defining a target, but I’m not sure if I am missing anything in dbt that can handle this scenario gracefully