DBT with Airflow

Hi, I’m trying to run dbt on airflow using Bash operator.
I installed DBT CLI version on the server and able to run the dbt run and dbt test command from command line.
But when I try to run the same commands through Airflow dags using bash operator I’m running into the below error. (fatal: Not a dbt project (or any of the parent directories). Missing dbt_project.yml file)

I’m not sure if I need any configs or install any airflow/dbt related packages (airflow-dbt) before running the dag.

Any help is much appreciated. Thank you.

*** Reading local file: /home/gkerkar/airflow/logs/DBT_DAG/dbt_run/2021-05-25T07:00:00+00:00/4.log
[2021-05-26 15:37:13,286] {taskinstance.py:670} INFO - Dependencies all met for <TaskInstance: DBT_DAG.dbt_run 2021-05-25T07:00:00+00:00 [queued]>
[2021-05-26 15:37:13,313] {taskinstance.py:670} INFO - Dependencies all met for <TaskInstance: DBT_DAG.dbt_run 2021-05-25T07:00:00+00:00 [queued]>
[2021-05-26 15:37:13,313] {taskinstance.py:880} INFO - 
--------------------------------------------------------------------------------
[2021-05-26 15:37:13,313] {taskinstance.py:881} INFO - Starting attempt 4 of 4
[2021-05-26 15:37:13,313] {taskinstance.py:882} INFO - 
--------------------------------------------------------------------------------
[2021-05-26 15:37:13,334] {taskinstance.py:901} INFO - Executing <Task(BashOperator): dbt_run> on 2021-05-25T07:00:00+00:00
[2021-05-26 15:37:13,338] {standard_task_runner.py:54} INFO - Started process 4428 to run task
[2021-05-26 15:37:13,376] {standard_task_runner.py:77} INFO - Running: ['airflow', 'run', 'DBT_DAG', 'dbt_run', '2021-05-25T07:00:00+00:00', '--job_id', '12481', '--pool', 'default_pool', '--raw', '-sd', 'DAGS_FOLDER/DBT_DAG.py', '--cfg_path', '/tmp/tmpicvwsxdq']
[2021-05-26 15:37:13,376] {standard_task_runner.py:78} INFO - Job 12481: Subtask dbt_run
[2021-05-26 15:37:13,427] {logging_mixin.py:112} INFO - Running %s on host %s <TaskInstance: DBT_DAG.dbt_run 2021-05-25T07:00:00+00:00 [running]> ip-10-131-129-91.us-west-1.compute.internal
[2021-05-26 15:37:13,460] {bash_operator.py:113} INFO - Tmp dir root location: 
 /tmp
[2021-05-26 15:37:13,461] {bash_operator.py:134} INFO - Temporary script location: /tmp/airflowtmpphoq5rc9/dbt_run9lohcjtk
[2021-05-26 15:37:13,461] {bash_operator.py:146} INFO - Running command: dbt run
[2021-05-26 15:37:13,471] {bash_operator.py:153} INFO - Output:
**[2021-05-26 15:37:15,811] {bash_operator.py:157} INFO - Running with dbt=0.19.1**
**[2021-05-26 15:37:15,811] {bash_operator.py:157} INFO - Encountered an error:**
**[2021-05-26 15:37:15,811] {bash_operator.py:157} INFO - Runtime Error**
**[2021-05-26 15:37:15,811] {bash_operator.py:157} INFO -   fatal: Not a dbt project (or any of the parent directories). Missing dbt_project.yml file**
[2021-05-26 15:37:15,984] {bash_operator.py:159} INFO - Command exited with return code 2
[2021-05-26 15:37:15,997] {taskinstance.py:1150} ERROR - Bash command failed

My 2cents without knowing your setup or restrictions!

TLDR: –project-dir flag

You’re able to use the dbt package which is great, but may need to examine your path during dag execution. Assuming you have a dbt_project.yml defined here /dbtproject/dbt_project.yml:

where_am_i = BashOperator(
    task_id="pathtest",
    bash_command="pwd"
    ....
)

If this is not {yourpathto}/dbtproject, then one of the following may help:

  1. bash_command="dbt run --project-dir /yourpathto/dbtproject"
  2. bash_command="cd /yourpathto/dbtproject && dbt run"

Additional:

  • potentially related thread
  • may need to adjust WORKDIR if in container or your python path
  • dbt cloud is a great (and free) product which includes task orchestration and scheduling: I resolved some airflow issues by converting

Hi @bwheeler122 ,

Thank you for your response,
Below is the dbt_project.yml location on the server and also attached my Airflow DAG code.
Do I need to install airflow-dbt or airflow-dbt-python package to make this work or its just a config or a bash profile value that I need to set.
Please help. probably I’m missing something trivial.

dbt_project.yml file location:

gkerkar@-----:~/dbt/de-dbt$ ls -lt

total 48

drwxrwxr-x 2 gkerkar gkerkar 4096 May 22 21:09 tests

drwxrwxr-x 4 gkerkar gkerkar 4096 Apr 8 18:32 models

-rw-rw-r-- 1 gkerkar gkerkar 1348 Apr 8 18:32 dbt_project.yml

-rw-r–r-- 1 gkerkar gkerkar 10 Apr 7 14:47 README.md

drwxrwxr-x 4 gkerkar gkerkar 4096 Apr 7 08:54 target

drwxrwxr-x 2 gkerkar gkerkar 4096 Apr 7 08:54 dbt_modules

drwxrwxr-x 2 gkerkar gkerkar 4096 Apr 7 08:54 logs

drwxrwxr-x 2 gkerkar gkerkar 4096 Apr 6 15:21 analysis

drwxrwxr-x 2 gkerkar gkerkar 4096 Apr 6 15:21 data

drwxrwxr-x 2 gkerkar gkerkar 4096 Apr 6 15:21 macros

drwxrwxr-x 2 gkerkar gkerkar 4096 Apr 6 15:21 snapshots

gkerkar@----:~/dbt/de-dbt$ pwd

/home/gkerkar/dbt/de-dbt

Airflow code:

from datetime import timedelta

from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.utils.dates import datetime
from airflow.utils.dates import timedelta

default_args = {

  • ‘owner’: ‘etluser’,*
  • ‘queue’: ‘de_queue’,*
  • ‘depends_on_past’: False,*
  • ‘start_date’: datetime(2021, 4, 18),*
  • ‘retries’: 1,*
  • ‘retry_delay’: timedelta(minutes=5)*
    }

dag = DAG(

  • ‘DBT_DAG’,*
  • template_searchpath="/home/gkerkar/dbt/de-dbt",*
  • default_args=default_args,*
  • description=‘An Airflow DAG to invoke simple dbt commands’,*
  • schedule_interval=timedelta(days=1),*
    )

dbt_run = BashOperator(

  • task_id=‘dbt_run’,*
  • bash_command=‘dbt run’,*
  • dag=dag*
    )

dbt_test = BashOperator(

  • task_id=‘dbt_test’,*
  • bash_command=‘dbt test’,*
  • dag=dag*
    )

dbt_run >> dbt_test

You could potentially try using the airflow dbt_operator instead, which might give you more control.

@patkearns10 , @bwheeler122 -

I was able to fix the issue using the below command.

bash_command="dbt run --project-dir /yourpathto/dbtproject"

Thank you for your help.

1 Like

I’ve come across a problem where Airflow cannot find my_first_project
Not sure how I can fix it.

I can make it work on my CLI.

Appreciate it!

We use the KubernetesPodOperator in our composer environment.
Here’s the discourse article with the explanation: