Dbt extremely slow on Airflow (Cloud Composer)

Hi all,

So I am having a strange issue with dbt, when running it on Cloud Composer (which is the managed service of Apache Airflow on Google Cloud Platform).

So even when there’s no model defined, dbt will require approximately 3mins just to report that there’s no such node (See the times being logged in the following output)

[2022-08-04, 08:33:05 UTC] {subprocess.py:74} INFO - Running command: ['bash', '-c', 'cd /home/airflow/gcs/dags/dbt/company && dbt --no-write-json run --profiles-dir ../profiles --target prod --select mymodel']
[2022-08-04, 08:33:05 UTC] {subprocess.py:85} INFO - Output:
[2022-08-04, 08:34:39 UTC] {subprocess.py:89} INFO - 08:34:39  Running with dbt=1.1.1
[2022-08-04, 08:34:40 UTC] {subprocess.py:89} INFO - 08:34:40  Unable to do partial parsing because profile has changed
[2022-08-04, 08:35:17 UTC] {subprocess.py:89} INFO - 08:35:17  Found 123 models, 3 tests, 1 snapshot, 0 analyses, 193 macros, 0 operations, 0 seed files, 12 sources, 0 exposures, 0 metrics
[2022-08-04, 08:35:17 UTC] {subprocess.py:89} INFO - 08:35:17  The selection criterion 'mymodel' does not match any nodes
[2022-08-04, 08:35:17 UTC] {subprocess.py:89} INFO - 08:35:17
[2022-08-04, 08:35:17 UTC] {subprocess.py:89} INFO - 08:35:17  [e[33mWARNINGe[0m]: Nothing to do. Try checking your model configs and model specification args
[2022-08-04, 08:35:28 UTC] {subprocess.py:93} INFO - Command exited with return code 0

I am using an Airflow BashOperator in order to run the dbt commands and the target is a BigQuery service. I’ve even tried to increase the number of threads (from 4, to 8) without any luck. If I run the same command from my local machine, it will literally take one second.

I’ve even tried running dbt --version and once again, it’s going to take a lot of time to run such a simple command (~2mins). And I’ve tried to run other bash commands -irrelevant to dbt- (such as cd) to see if it’s a general issue with BashOperator in Airflow, but apparently for other commands except dbt everything looks normal and gets executed in a reasonable amount of time.

Any suggestions?

Hello try your luck using Dbtoperators
Please follow

Hello, I think the issue is due to GCC (Google Cloud Composer) that needs to spin up an instance to be able to run dbt, therefore the 2 to 3 minutes wait time you see before getting the log out, if you want to improve the performance of your environment in GCP I suggest following this guide Optimize environment performance and costs  |  Cloud Composer  |  Google Cloud
Remember that in GCP there is always a tradeoff of cost and velocity, hope this helps.

I think another issue when using Cloud Composer is that the directory that holds the dbt files is a GCP bucket that is mounted on the airflow workers (‘/home/airflow/gcs/…’). When dbt executes any command it needs to look at these files to compile and decide what to do next. My guess is that since dbt needs to hit a lot of GCP bucket files that this is source of the slowdown, and is the reason it is faster to execute dbt commands locally, when dbt only needs to look at your local files.

As @rgranados said it is all about the tradeoffs! In my experience the slower execution time was worth having the user friendly deployment of dbt on cloud composer.

BTW, I don’t believe the airflow-dbt package will have any effect here as that is just a wrapper around the BashOperator, which you are already using.

2 Likes