DBT on Airflow - localExecutor

konos · May 14, 2020, 12:23pm

Hi All,

I am experimenting on running DBT with Airflow. So far I have managed to setup both tools but in Docker Compose that uses the localExecutor from Airflow and runs models using “dbt run --models …”. In order to design the different DAGs I am using DBT tags to try to organise/filter models. In order to build the models’ dependencies and identify the tags, I am parsing the manifest.json file after it has been compiled.

I am wondering:

Do people follow alternative ways with regards to Airflow?
Any issues people have faced with complicated models and how to run them in Airflow?

The major deficiency that I have so far in my design is that it does not utilise the raw SQL from manifest.json. This will be my next step. But other than that, I would like some feedback on how this project be improved. Probably in a production Airflow environment, thing would be slightly different.

The repository is at GitHub - konosp/dbt-airflow-docker-compose: Execution of DBT models using Apache Airflow through Docker Compose and my general post about it is at Apache Airflow and DBT on Docker Compose | Analytics Mayhem.

You can clone and run if you wish using the Kaggle dataset mentioned in the instructions (not all sample data can be uploaded in GitHub).

Any feedback is welcomed!

Cheers,
Konstantinos

debadatta.moha · May 23, 2020, 7:21pm

hi i am try to set up the same , dbt with airflow but getting this error

2020-05-23 20:18:10,283] {bash_operator.py:123} INFO - pkg_resources.DistributionNotFound: The 'dbt-core==0.16.1' distribution was not found and is required by the application
[2020-05-23 20:18:10,295] {bash_operator.py:127} INFO - Command exited with return code 1
[2020-05-23 20:18:10,304] {models.py:1788} ERROR - Bash command failed
Traceback (most recent call last):
  File "/Users/mohapdeb/airflow/airlfowvenv/lib/python3.7/site-packages/airflow/models.py", line 1657, in _run_raw_task
    result = task_copy.execute(context=context)
  File "/Users/mohapdeb/airflow/airlfowvenv/lib/python3.7/site-packages/airflow/operators/bash_operator.py", line 131, in execute
    raise AirflowException("Bash command failed")
airflow.exceptions.AirflowException: Bash command failed

konos · May 24, 2020, 10:31am

Hi @debadatta.moha,

This is odd since within the dockerfile it is explicitly specified to instal version 0.15 (pip install dbt==0.15).
Can you please run the steps below:

Navigate into the project directory
Remove all docker compose images by “docker-compose rm”
Re-build the images by running “docker-compose build”
Re-run the services by “docker-compose up”
Attach into the container that executes Airflow by “docker exec -it dbt-airflow-docker_airflow_1 /bin/bash”. Now you have a terminal session open within the container.
Execute “dbt --version”. Does this run properly and is the current version 0.15?

If the above is working fine and you still get the issue, I would suggest to try the following:

Modify the dockerfile (https://github.com/konosp/dbt-airflow-docker-compose/blob/master/dockerfile) to require version 0.16.1.
Re-run the steps described above.

I am not sure why this issue would appear.

Please try the above and let me know how it looks.

Cheers,
Konstantinos

debadatta.moha · May 25, 2020, 5:56pm

Hi @konos,

I have tried the same but i am getting the same error.i have tried with dbt = 0.16.1 but no luck.
But replace this with docker build -t dbt-airflow . && docker run -it –rm -p 8080:8080 dbt-airflow https://github.com/konosp/adobe-clickstream-dbt.git

but i tried this …

docker run -it –rm -p 8080:8080 dbt-airflow https://github.com/fishtown-analytics/jaffle_shop.git
still facing the same issue, can u try this and let me

Thanks
Deba

konos · May 25, 2020, 11:05pm

Hi @debadatta.moha,

I think you are checking wrong repository.
In order to run the “docker-compose” version, you need to check the readme steps. This is meant to execute under docker-compose not plain docker.

Did you follow the command below?

docker-compose build && docker-compose up
This will activate all the services and build the correct images.

Once the services are up and running did you try to execute "docker exec -it dbt-airflow-docker_airflow_1 /bin/bash”?

Please try the steps above and let me know.

Cheers,
Konstantinos

debadatta.moha · May 27, 2020, 6:35am

@konos, sorry if i was not clear enough as i was trying multiple things . I am using this blog post to run airflow and dbt not the docker compose, i am using schedule dbt models with apache airflow
Below is the link.

What changes I did …
In Docker file

RUN pip install dbt==0.16.1
&& pip install apache-airflow

And also while running rather than passing adobe-clicekstream-dbt.git
i used https://github.com/fishtown-analytics/jaffle_shop.git

so docker run look something like this -
docker run -it -p 8808:8080 --rm dbt-airflow https://github.com/fishtown-analytics/jaffle_shop.git

and also execute the bash command inside the container but still i am getting the error
bash command result .

Below error from docker container where airflow is throwing continues error

Regarding my First question , when it started throwing error in docker i tried my local airflow set up ,just placing the dag and running airflow, them i got the below error

2020-05-23 20:18:10,283] {bash_operator.py:123} INFO - pkg_resources.DistributionNotFound: The ‘dbt-core==0.16.1’ distribution was not found and is required by the application
[2020-05-23 20:18:10,295] {bash_operator.py:127} INFO - Command exited with return code 1
[2020-05-23 20:18:10,304] {models.py:1788} ERROR - Bash command failed
Traceback (most recent call last):
File “/Users/mohapdeb/airflow/airlfowvenv/lib/python3.7/site-packages/airflow/models.py”, line 1657, in _run_raw_task
result = task_copy.execute(context=context)
File “/Users/mohapdeb/airflow/airlfowvenv/lib/python3.7/site-packages/airflow/operators/bash_operator.py”, line 131, in execute
raise AirflowException(“Bash command failed”)
airflow.exceptions.AirflowException: Bash command failed

Hope this help you diagnose the problem

konos · May 29, 2020, 7:53pm

Hi @debadatta.moha,

Thanks for the extra details. I finally managed to find time and reproduce this. The root of the issue was the code was not checking if a node was a root node or intermediate (not sure how I missed this).

In any case, I updated the repository and managed to load the DAG using the same DBT repo

The above however will just load the DAG. In order to run, it is required to configure the profile.yml file (https://github.com/konosp/dbt-on-airflow/blob/master/misc/profile-demo_sample.yml) in order to connect to a database. Also it requires the data to be loaded. Otherwise the execution of the DAG will fail and you will get the errors above.

Unfortunately this is a very basic (and amateurish) dockerfile and the initial intention was to execute on BigQuery having the data already loaded. I was not aware about this small demo dbt project so I could use it.

Let me know if the above helps for now.

Cheers,
Konstantinos

CloudTechGirl · April 24, 2021, 6:36pm

Hi @konos

I get this error when i try to do ‘docker-compose up’

airflow_1 | Running with dbt=0.15.0
airflow_1 | Found 10 models, 0 tests, 0 snapshots, 0 analyses, 120 macros, 0 operations, 0 seed files, 6 sources
airflow_1 |
airflow_1 | 18:32:18 | Concurrency: 4 threads (target=‘dev’)
airflow_1 | 18:32:18 |
airflow_1 | 18:32:18 | Done.
airflow_1 | Traceback (most recent call last):
airflow_1 | File “/usr/local/bin/airflow”, line 26, in
airflow_1 | from airflow.bin.cli import CLIFactory
airflow_1 | File “/usr/local/lib/python3.7/site-packages/airflow/bin/cli.py”, line 82, in
airflow_1 | from airflow.www.app import (cached_app, create_app)
airflow_1 | File “/usr/local/lib/python3.7/site-packages/airflow/www/app.py”, line 42, in
airflow_1 | from airflow.www.blueprints import routes
airflow_1 | File “/usr/local/lib/python3.7/site-packages/airflow/www/blueprints.py”, line 25, in
airflow_1 | from airflow.www import utils as wwwutils
airflow_1 | File “/usr/local/lib/python3.7/site-packages/airflow/www/utils.py”, line 40, in
airflow_1 | import flask_admin.contrib.sqla.filters as sqlafilters
airflow_1 | File “/usr/local/lib/python3.7/site-packages/flask_admin/contrib/sqla/init.py”, line 2, in
airflow_1 | from .view import ModelView
airflow_1 | File “/usr/local/lib/python3.7/site-packages/flask_admin/contrib/sqla/view.py”, line 18, in
airflow_1 | from flask_admin.contrib.sqla.tools import is_relationship
airflow_1 | File “/usr/local/lib/python3.7/site-packages/flask_admin/contrib/sqla/tools.py”, line 4, in
airflow_1 | from sqlalchemy.ext.declarative.clsregistry import _class_resolver
airflow_1 | ModuleNotFoundError: No module named 'sqlalchemy.ext.declarative.clsregistry’

CloudTechGirl · April 24, 2021, 6:45pm

I added these two statements in my dockerfile and it is up

RUN pip install SQLAlchemy==1.3.23
RUN pip install Flask-SQLAlchemy==2.4.4

konos · April 25, 2021, 3:40pm

Thanks @CloudTechGirl for flagging this. Not sure how this bug occured. By the way, managed to make it work just by adding:

RUN pip install SQLAlchemy==1.3.23

I will update the repository as well. Let me know for other issues, if you find any.

Thanks!

Topic		Replies	Views
Airflow Integration Archive	1	4113	May 28, 2020
DBT on airflow docker container Archive	0	3226	April 29, 2021
What is the best practice for deploying Airflow together with dbt? In-Depth Discussions airflow , orchestration-and-deployment	1	4480	December 29, 2020
How to deploy airflow together with DBT? Archive	0	2624	December 24, 2020
How to configure environment variables for dbtoperator Help bigquery , airflow , orchestration-and-deployment	6	2115	February 3, 2023

DBT on Airflow - localExecutor

Related Topics