DBT on Airflow - localExecutor

Hi All,

I am experimenting on running DBT with Airflow. So far I have managed to setup both tools but in Docker Compose that uses the localExecutor from Airflow and runs models using “dbt run --models …”. In order to design the different DAGs I am using DBT tags to try to organise/filter models. In order to build the models’ dependencies and identify the tags, I am parsing the manifest.json file after it has been compiled.

I am wondering:

  • Do people follow alternative ways with regards to Airflow?
  • Any issues people have faced with complicated models and how to run them in Airflow?

The major deficiency that I have so far in my design is that it does not utilise the raw SQL from manifest.json. This will be my next step. But other than that, I would like some feedback on how this project be improved. Probably in a production Airflow environment, thing would be slightly different.

The repository is at GitHub - konosp/dbt-airflow-docker-compose: Execution of DBT models using Apache Airflow through Docker Compose and my general post about it is at Apache Airflow and DBT on Docker Compose | Analytics Mayhem.

You can clone and run if you wish using the Kaggle dataset mentioned in the instructions (not all sample data can be uploaded in GitHub).

Any feedback is welcomed!

Cheers,
Konstantinos

1 Like

hi i am try to set up the same , dbt with airflow but getting this error

2020-05-23 20:18:10,283] {bash_operator.py:123} INFO - pkg_resources.DistributionNotFound: The 'dbt-core==0.16.1' distribution was not found and is required by the application
[2020-05-23 20:18:10,295] {bash_operator.py:127} INFO - Command exited with return code 1
[2020-05-23 20:18:10,304] {models.py:1788} ERROR - Bash command failed
Traceback (most recent call last):
  File "/Users/mohapdeb/airflow/airlfowvenv/lib/python3.7/site-packages/airflow/models.py", line 1657, in _run_raw_task
    result = task_copy.execute(context=context)
  File "/Users/mohapdeb/airflow/airlfowvenv/lib/python3.7/site-packages/airflow/operators/bash_operator.py", line 131, in execute
    raise AirflowException("Bash command failed")
airflow.exceptions.AirflowException: Bash command failed

Hi @debadatta.moha,

This is odd since within the dockerfile it is explicitly specified to instal version 0.15 (pip install dbt==0.15).
Can you please run the steps below:

  1. Navigate into the project directory
  2. Remove all docker compose images by “docker-compose rm”
  3. Re-build the images by running “docker-compose build”
  4. Re-run the services by “docker-compose up”
  5. Attach into the container that executes Airflow by “docker exec -it dbt-airflow-docker_airflow_1 /bin/bash”. Now you have a terminal session open within the container.
  6. Execute “dbt --version”. Does this run properly and is the current version 0.15?

If the above is working fine and you still get the issue, I would suggest to try the following:

  1. Modify the dockerfile (https://github.com/konosp/dbt-airflow-docker-compose/blob/master/dockerfile) to require version 0.16.1.
  2. Re-run the steps described above.

I am not sure why this issue would appear.

Please try the above and let me know how it looks.

Cheers,
Konstantinos

Hi @konos,

I have tried the same but i am getting the same error.i have tried with dbt = 0.16.1 but no luck.
But replace this with docker build -t dbt-airflow . && docker run -it –rm -p 8080:8080 dbt-airflow https://github.com/konosp/adobe-clickstream-dbt.git

but i tried this …

docker run -it –rm -p 8080:8080 dbt-airflow https://github.com/fishtown-analytics/jaffle_shop.git
still facing the same issue, can u try this and let me

Thanks
Deba

Hi @debadatta.moha,

I think you are checking wrong repository.
In order to run the “docker-compose” version, you need to check the readme steps. This is meant to execute under docker-compose not plain docker.

Did you follow the command below?

  • docker-compose build && docker-compose up
    This will activate all the services and build the correct images.

Once the services are up and running did you try to execute "docker exec -it dbt-airflow-docker_airflow_1 /bin/bash”?

Please try the steps above and let me know.

Cheers,
Konstantinos

@konos, sorry if i was not clear enough as i was trying multiple things . I am using this blog post to run airflow and dbt not the docker compose, i am using schedule dbt models with apache airflow
Below is the link.

What changes I did …
In Docker file

RUN pip install dbt==0.16.1
&& pip install apache-airflow

And also while running rather than passing adobe-clicekstream-dbt.git
i used https://github.com/fishtown-analytics/jaffle_shop.git

so docker run look something like this -
docker run -it -p 8808:8080 --rm dbt-airflow https://github.com/fishtown-analytics/jaffle_shop.git

and also execute the bash command inside the container but still i am getting the error
bash command result .

Below error from docker container where airflow is throwing continues error

Regarding my First question , when it started throwing error in docker i tried my local airflow set up ,just placing the dag and running airflow, them i got the below error

2020-05-23 20:18:10,283] {bash_operator.py:123} INFO - pkg_resources.DistributionNotFound: The ‘dbt-core==0.16.1’ distribution was not found and is required by the application
[2020-05-23 20:18:10,295] {bash_operator.py:127} INFO - Command exited with return code 1
[2020-05-23 20:18:10,304] {models.py:1788} ERROR - Bash command failed
Traceback (most recent call last):
File “/Users/mohapdeb/airflow/airlfowvenv/lib/python3.7/site-packages/airflow/models.py”, line 1657, in _run_raw_task
result = task_copy.execute(context=context)
File “/Users/mohapdeb/airflow/airlfowvenv/lib/python3.7/site-packages/airflow/operators/bash_operator.py”, line 131, in execute
raise AirflowException(“Bash command failed”)
airflow.exceptions.AirflowException: Bash command failed

Hope this help you diagnose the problem

Hi @debadatta.moha,

Thanks for the extra details. I finally managed to find time and reproduce this. The root of the issue was the code was not checking if a node was a root node or intermediate (not sure how I missed this).

In any case, I updated the repository and managed to load the DAG using the same DBT repo

The above however will just load the DAG. In order to run, it is required to configure the profile.yml file (https://github.com/konosp/dbt-on-airflow/blob/master/misc/profile-demo_sample.yml) in order to connect to a database. Also it requires the data to be loaded. Otherwise the execution of the DAG will fail and you will get the errors above.

Unfortunately this is a very basic (and amateurish) dockerfile and the initial intention was to execute on BigQuery having the data already loaded. I was not aware about this small demo dbt project so I could use it.

Let me know if the above helps for now.

Cheers,
Konstantinos

Hi @konos

I get this error when i try to do ‘docker-compose up’

airflow_1 | Running with dbt=0.15.0
airflow_1 | Found 10 models, 0 tests, 0 snapshots, 0 analyses, 120 macros, 0 operations, 0 seed files, 6 sources
airflow_1 |
airflow_1 | 18:32:18 | Concurrency: 4 threads (target=‘dev’)
airflow_1 | 18:32:18 |
airflow_1 | 18:32:18 | Done.
airflow_1 | Traceback (most recent call last):
airflow_1 | File “/usr/local/bin/airflow”, line 26, in
airflow_1 | from airflow.bin.cli import CLIFactory
airflow_1 | File “/usr/local/lib/python3.7/site-packages/airflow/bin/cli.py”, line 82, in
airflow_1 | from airflow.www.app import (cached_app, create_app)
airflow_1 | File “/usr/local/lib/python3.7/site-packages/airflow/www/app.py”, line 42, in
airflow_1 | from airflow.www.blueprints import routes
airflow_1 | File “/usr/local/lib/python3.7/site-packages/airflow/www/blueprints.py”, line 25, in
airflow_1 | from airflow.www import utils as wwwutils
airflow_1 | File “/usr/local/lib/python3.7/site-packages/airflow/www/utils.py”, line 40, in
airflow_1 | import flask_admin.contrib.sqla.filters as sqlafilters
airflow_1 | File “/usr/local/lib/python3.7/site-packages/flask_admin/contrib/sqla/init.py”, line 2, in
airflow_1 | from .view import ModelView
airflow_1 | File “/usr/local/lib/python3.7/site-packages/flask_admin/contrib/sqla/view.py”, line 18, in
airflow_1 | from flask_admin.contrib.sqla.tools import is_relationship
airflow_1 | File “/usr/local/lib/python3.7/site-packages/flask_admin/contrib/sqla/tools.py”, line 4, in
airflow_1 | from sqlalchemy.ext.declarative.clsregistry import _class_resolver
airflow_1 | ModuleNotFoundError: No module named 'sqlalchemy.ext.declarative.clsregistry’

I added these two statements in my dockerfile and it is up

RUN pip install SQLAlchemy==1.3.23
RUN pip install Flask-SQLAlchemy==2.4.4

Thanks @CloudTechGirl for flagging this. Not sure how this bug occured. By the way, managed to make it work just by adding:

RUN pip install SQLAlchemy==1.3.23

I will update the repository as well. Let me know for other issues, if you find any.

Thanks!

1 Like