I am experimenting on running DBT with Airflow. So far I have managed to setup both tools but in Docker Compose that uses the localExecutor from Airflow and runs models using “dbt run --models …”. In order to design the different DAGs I am using DBT tags to try to organise/filter models. In order to build the models’ dependencies and identify the tags, I am parsing the manifest.json file after it has been compiled.
I am wondering:
Do people follow alternative ways with regards to Airflow?
Any issues people have faced with complicated models and how to run them in Airflow?
The major deficiency that I have so far in my design is that it does not utilise the raw SQL from manifest.json. This will be my next step. But other than that, I would like some feedback on how this project be improved. Probably in a production Airflow environment, thing would be slightly different.
hi i am try to set up the same , dbt with airflow but getting this error
2020-05-23 20:18:10,283] {bash_operator.py:123} INFO - pkg_resources.DistributionNotFound: The 'dbt-core==0.16.1' distribution was not found and is required by the application
[2020-05-23 20:18:10,295] {bash_operator.py:127} INFO - Command exited with return code 1
[2020-05-23 20:18:10,304] {models.py:1788} ERROR - Bash command failed
Traceback (most recent call last):
File "/Users/mohapdeb/airflow/airlfowvenv/lib/python3.7/site-packages/airflow/models.py", line 1657, in _run_raw_task
result = task_copy.execute(context=context)
File "/Users/mohapdeb/airflow/airlfowvenv/lib/python3.7/site-packages/airflow/operators/bash_operator.py", line 131, in execute
raise AirflowException("Bash command failed")
airflow.exceptions.AirflowException: Bash command failed
This is odd since within the dockerfile it is explicitly specified to instal version 0.15 (pip install dbt==0.15).
Can you please run the steps below:
Navigate into the project directory
Remove all docker compose images by “docker-compose rm”
Re-build the images by running “docker-compose build”
Re-run the services by “docker-compose up”
Attach into the container that executes Airflow by “docker exec -it dbt-airflow-docker_airflow_1 /bin/bash”. Now you have a terminal session open within the container.
Execute “dbt --version”. Does this run properly and is the current version 0.15?
If the above is working fine and you still get the issue, I would suggest to try the following:
I have tried the same but i am getting the same error.i have tried with dbt = 0.16.1 but no luck.
But replace this with docker build -t dbt-airflow . && docker run -it –rm -p 8080:8080 dbt-airflow https://github.com/konosp/adobe-clickstream-dbt.git
I think you are checking wrong repository.
In order to run the “docker-compose” version, you need to check the readme steps. This is meant to execute under docker-compose not plain docker.
Did you follow the command below?
docker-compose build && docker-compose up
This will activate all the services and build the correct images.
Once the services are up and running did you try to execute "docker exec -it dbt-airflow-docker_airflow_1 /bin/bash”?
@konos, sorry if i was not clear enough as i was trying multiple things . I am using this blog post to run airflow and dbt not the docker compose, i am using schedule dbt models with apache airflow
Below is the link.
What changes I did …
In Docker file
RUN pip install dbt==0.16.1
&& pip install apache-airflow
Regarding my First question , when it started throwing error in docker i tried my local airflow set up ,just placing the dag and running airflow, them i got the below error
2020-05-23 20:18:10,283] {bash_operator.py:123} INFO - pkg_resources.DistributionNotFound: The ‘dbt-core==0.16.1’ distribution was not found and is required by the application
[2020-05-23 20:18:10,295] {bash_operator.py:127} INFO - Command exited with return code 1
[2020-05-23 20:18:10,304] {models.py:1788} ERROR - Bash command failed
Traceback (most recent call last):
File “/Users/mohapdeb/airflow/airlfowvenv/lib/python3.7/site-packages/airflow/models.py”, line 1657, in _run_raw_task
result = task_copy.execute(context=context)
File “/Users/mohapdeb/airflow/airlfowvenv/lib/python3.7/site-packages/airflow/operators/bash_operator.py”, line 131, in execute
raise AirflowException(“Bash command failed”)
airflow.exceptions.AirflowException: Bash command failed
Thanks for the extra details. I finally managed to find time and reproduce this. The root of the issue was the code was not checking if a node was a root node or intermediate (not sure how I missed this).
In any case, I updated the repository and managed to load the DAG using the same DBT repo
The above however will just load the DAG. In order to run, it is required to configure the profile.yml file (https://github.com/konosp/dbt-on-airflow/blob/master/misc/profile-demo_sample.yml) in order to connect to a database. Also it requires the data to be loaded. Otherwise the execution of the DAG will fail and you will get the errors above.
Unfortunately this is a very basic (and amateurish) dockerfile and the initial intention was to execute on BigQuery having the data already loaded. I was not aware about this small demo dbt project so I could use it.