Hi, I’m trying to run dbt on airflow using Bash operator.
I installed DBT CLI version on the server and able to run the dbt run and dbt test command from command line.
But when I try to run the same commands through Airflow dags using bash operator I’m running into the below error. (fatal: Not a dbt project (or any of the parent directories). Missing dbt_project.yml file)
I’m not sure if I need any configs or install any airflow/dbt related packages (airflow-dbt) before running the dag.
Any help is much appreciated. Thank you.
*** Reading local file: /home/gkerkar/airflow/logs/DBT_DAG/dbt_run/2021-05-25T07:00:00+00:00/4.log
[2021-05-26 15:37:13,286] {taskinstance.py:670} INFO - Dependencies all met for <TaskInstance: DBT_DAG.dbt_run 2021-05-25T07:00:00+00:00 [queued]>
[2021-05-26 15:37:13,313] {taskinstance.py:670} INFO - Dependencies all met for <TaskInstance: DBT_DAG.dbt_run 2021-05-25T07:00:00+00:00 [queued]>
[2021-05-26 15:37:13,313] {taskinstance.py:880} INFO -
--------------------------------------------------------------------------------
[2021-05-26 15:37:13,313] {taskinstance.py:881} INFO - Starting attempt 4 of 4
[2021-05-26 15:37:13,313] {taskinstance.py:882} INFO -
--------------------------------------------------------------------------------
[2021-05-26 15:37:13,334] {taskinstance.py:901} INFO - Executing <Task(BashOperator): dbt_run> on 2021-05-25T07:00:00+00:00
[2021-05-26 15:37:13,338] {standard_task_runner.py:54} INFO - Started process 4428 to run task
[2021-05-26 15:37:13,376] {standard_task_runner.py:77} INFO - Running: ['airflow', 'run', 'DBT_DAG', 'dbt_run', '2021-05-25T07:00:00+00:00', '--job_id', '12481', '--pool', 'default_pool', '--raw', '-sd', 'DAGS_FOLDER/DBT_DAG.py', '--cfg_path', '/tmp/tmpicvwsxdq']
[2021-05-26 15:37:13,376] {standard_task_runner.py:78} INFO - Job 12481: Subtask dbt_run
[2021-05-26 15:37:13,427] {logging_mixin.py:112} INFO - Running %s on host %s <TaskInstance: DBT_DAG.dbt_run 2021-05-25T07:00:00+00:00 [running]> ip-10-131-129-91.us-west-1.compute.internal
[2021-05-26 15:37:13,460] {bash_operator.py:113} INFO - Tmp dir root location:
/tmp
[2021-05-26 15:37:13,461] {bash_operator.py:134} INFO - Temporary script location: /tmp/airflowtmpphoq5rc9/dbt_run9lohcjtk
[2021-05-26 15:37:13,461] {bash_operator.py:146} INFO - Running command: dbt run
[2021-05-26 15:37:13,471] {bash_operator.py:153} INFO - Output:
**[2021-05-26 15:37:15,811] {bash_operator.py:157} INFO - Running with dbt=0.19.1**
**[2021-05-26 15:37:15,811] {bash_operator.py:157} INFO - Encountered an error:**
**[2021-05-26 15:37:15,811] {bash_operator.py:157} INFO - Runtime Error**
**[2021-05-26 15:37:15,811] {bash_operator.py:157} INFO - fatal: Not a dbt project (or any of the parent directories). Missing dbt_project.yml file**
[2021-05-26 15:37:15,984] {bash_operator.py:159} INFO - Command exited with return code 2
[2021-05-26 15:37:15,997] {taskinstance.py:1150} ERROR - Bash command failed
You’re able to use the dbt package which is great, but may need to examine your path during dag execution. Assuming you have a dbt_project.yml defined here /dbtproject/dbt_project.yml:
Thank you for your response,
Below is the dbt_project.yml location on the server and also attached my Airflow DAG code.
Do I need to install airflow-dbt or airflow-dbt-python package to make this work or its just a config or a bash profile value that I need to set.
Please help. probably I’m missing something trivial.
dbt_project.yml file location:
gkerkar@-----:~/dbt/de-dbt$ ls -lt
total 48
drwxrwxr-x 2 gkerkar gkerkar 4096 May 22 21:09 tests
from airflow import DAG from airflow.operators.bash_operator import BashOperator from airflow.utils.dates import datetime from airflow.utils.dates import timedelta
default_args = {
‘owner’: ‘etluser’,*
‘queue’: ‘de_queue’,*
‘depends_on_past’: False,*
‘start_date’: datetime(2021, 4, 18),*
‘retries’: 1,*
‘retry_delay’: timedelta(minutes=5)* }
dag = DAG(
‘DBT_DAG’,*
template_searchpath="/home/gkerkar/dbt/de-dbt",*
default_args=default_args,*
description=‘An Airflow DAG to invoke simple dbt commands’,*