How to run dbt on Cloud Composer and authenticate the service account

Hi,

I would like to run dbt through Cloud Composer (which is the GCP’s managed service for Apache Airflow) and I am struggling to find a proper and secure way in order to authenticate dbt so that it can perform operations on Google Cloud BigQuery.

So here’s my profile in profiles.yaml file:

my-profile:
  outputs:
    dev:
      dataset: my-bigquery-dataset
      job_execution_timeout_seconds: 300
      job_retries: 1
      keyfile: /path/to/keyfile.json
      location: EU
      method: service-account
      priority: interactive
      project: my-gcp-project
      threads: 4
      type: bigquery
  target: dev

Where am I supposed to store the service account JSON key in order to be visible to dbt when running it through Cloud Composer?

One option is to place the JSON key file on Cloud Storage, under gs://<cloud-composer-bucket>/data which is mounted into every Airflow worker and simply use the /home/airflow/gcs/service-account-key.json for keyfile on dbt’s profile but I don’t think this is a secure enough approach (i.e. storing the service account key on an object storage such as GCS).

PS: Note that I am not using Secret Manager on GCP, but self-hosted HashiCorp Vault instead.

Thanks in advance.
Giorgos

Hi Giorgos,

We run dbt via “GKEStartPodOperator” on Cloud Composer. We can then leverage workload identity of the service account running the node pool. This has the benefit of separating the scheduling of Airlfow from the execution of dbt that will be on a separate node pool. And also no need anymore to create a json key for the service account.

Best regards,

Charles

Thanks a lot for your reply Charles!
May I also ask what method/keyfile do you specify in your profiles.yml file when using workload identity?

type: bigquery
method: oauth

dbt will then automatically use the credentials of the service account of the node pool.

I have written a series of articles on how to run dbt at best on Google Cloud. Here is the link of part 1 if you are interested:

1 Like

A post was split to a new topic: Equivalent of GKEStartPodOperator in Anthos Cloud Run with Workload Identity Federation

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.