Modelling multiple BigQuery projects with dbt

Hey there, I’m pretty new to dbt but really excited to get started using it!

I’ve been playing around with a small dbt setup locally and have gotten a number of things up and running but am struggling on identifying the best solution for my use case.

For context, my company has multiple GCP projects, each of them having their own BigQuery instance with data and views that are sometimes custom and sometimes shared.

Two specific examples:

  • Table A in project 1 is used as a source for view A in project 2
  • View B is supposed to be created in project 1, project 2, and project 3 (where the view name and schema name is the same across all projects)

Ideally it’s one team that controls dbt for all these projects so with that in mind I’ve tried exploring a few potential solutions including:

  • Multiple local dbt projects (one git repo)
  • One dbt project using different databases (BigQuery projects)

Unfortunately with both of those solutions I’m running into issues. I’d really appreciate any guidance on thoughts for how to approach this.

Thanks!

Hi @azhard great question(s). Could you give a little more detail on the issues that you are running into with each of these options? And the reasoning (if known) behind having multiple GCP projects for your company in the first place? We have a couple of GCP projects, but basically one is our ‘dev’ project and one is our ‘prod’ project.

Hey @josh, happy to give some more detail. We have GCP projects for dev and prod as well for our team, however there’s one (legacy) quirk that we still need to maintain where some views in our prod project have to be created in other team’s prod projects as well and to make it worse, the dataset name is the same as well. So for example right now we would have these views:

  • my_teams_prod_project.dataset.view
  • team_1s_prod_project.dataset.view
  • team_2s_prod_project.dataset.view
  • etc.

And to make things even worse, there is a table in team_1s_prod_project that is a source for one of the views that gets copied across all the projects. Because of this, I can’t have each team as a separate dbt project as the my_team_dbt_project would require team_1s_dbt_project and vice versa leading to a circular dependency.

Attempting to use the single dbt project approach, there are two blockers:

  1. Issue using the ref function. ref is based on the file name (and also the package name if from an external package) but the we would theoretically have the same view file many times in our repo so I’d need to have custom names for each and then use alias to get the proper name in our repo, which leads to blocker 2…
  2. alias currently checks ambiguity based ONLY on schema and name, not the project. So even though the project would be different for each of those files, alias can’t be used. The good news however is that I just found a PR that is currently in review that fixes this!

So once that PR gets merged, it looks like I can solve my problem, albeit using unique file names for the views in each project. Looks like this will be the solution that I’ll be using unless you have any other suggestions.

Thanks again for the help!