Dbt Airflow on Google Cloud Composer -

I am testing a deployment of dbt within Cloud composer. On my local machine (Ubuntu 20.04) I have got success in running the dbt models with airflow. When running on Google Cloud composer I get the following error

{subprocess.py:74} INFO - Output:
{subprocess.py:78} INFO - Running with dbt=0.21.0
{subprocess.py:78} INFO - Encountered an error while reading the project:
{subprocess.py:78} INFO -   ERROR: Runtime Error
{subprocess.py:78} INFO -   at path ['name']: 'jaffle_shop' does not match '^[^\\d\\W]\\w*$'
{subprocess.py:78} INFO - 
{subprocess.py:78} INFO - Error encountered in /home/airflow/gcs/dags/dbt_project.yml
{subprocess.py:78} INFO - Encountered an error:
{subprocess.py:78} INFO - Runtime Error
{subprocess.py:78} INFO -   Could not run dbt
{subprocess.py:82} INFO - Command exited with return code 2
{taskinstance.py:1503} ERROR - Task failed with exception

We are using a BashOperator to run dbt models in Airflow.
Initially had some problems with dependencies but they were solved.
Using a standard dbt_project.yml file with a single model just to test how this works.
Another way is to use Docker but we need try if this works.

Editstrong text

Versions

dbt: 0.21.0

cloud-composer: 1.17.1

airflow: 2.1.2

Pypi Packages
airflow-dbt: 0.4.0

dbt: 0.21.0

jsonschema: 3.1 (Added this as Pypi was throwing an error about the version

I really appreciate any help.

Pete.

Hi there,
I’m not sure what the reason is for your error but this is my experience. Hope this helps in a way.
When we used the bash operator we ran into issues with dependencies issues between composer, airflow and dbt. We ended copying our dbt projects into a folder, in the airflow GCS bucket.
We had one folder for many dbt projects. The bash operator ran a dbt_run.sh, passing the needed parameters:

  • target
  • models
  • vars
  • full_refresh
  • repo_name (dbt project name)

The .sh used a Makefile to build the dbt run commands and builds a virtual environment to bypass the dependencies issues. DISCLAIMER: I wouldn’t suggest using running a virtual environment in composer since it’s not approved by Google. We only found out much later and are in the process of changing this. Since you’ve solved the dependencies issues this might still be useful.
Here’s the basic python:

call_dbt_run = BashOperator(
    task_id='call_dbt_run',
    bash_command=f"/home/airflow/gcs/data/dbt-projects/dbt_run.sh {dbt_target} {dbt_model} {dbt_vars} {full_refresh} {repo_name}",
	on_success_callback=log_dbt_success,
	on_failure_callback=log_dbt_failure,
    dag=dag,
)

Here is the .sh:

#!/bin/bash
dbt_target=$1
dbt_models=$2
dbt_vars=$3
full_refresh=$4
repo_name=$5
suffix=$(date +%s$RANDOM)

echo "Start creating virtualenv: dbt_env$suffix"
virtualenv /tmp/dbt_env$suffix --python=python3.6.10
echo "Finish creating virtualenv: dbt_env$suffix"

cd /home/airflow/gcs/data/dbt-projects/

# copy the project repo code to a local temp folder
mkdir -p /tmp/$suffix
cp -r $repo_name /tmp/$suffix
ls /tmp/$suffix/$repo_name

echo "Start activating virtualenv and installing packages: dbt_env$suffix"
source /tmp/dbt_env$suffix/bin/activate && /tmp/dbt_env$suffix/bin/pip install -r ./requirements.txt
echo "Finish activating virtualenv and installing packages: dbt_env$suffix"

if [ "$full_refresh" = "True" ]; then
  make run-full-refresh DBT_TARGET="$dbt_target" DBT_MODELS="$dbt_models" DBT_VARS="$dbt_vars" REPO_NAME="$repo_name" SUFFIX="$suffix"
else
  make run DBT_TARGET="$dbt_target" DBT_MODELS="$dbt_models" DBT_VARS="$dbt_vars" REPO_NAME="$repo_name" SUFFIX="$suffix"
fi

# capture the exit code from the dbt run command
# so that the final exit code form removing virtualenv cmd doesn't get used by BashOperator
exit_code=$?

echo "Start removing virtualenv: dbt_env$suffix"
rm -rf /tmp/dbt_env$suffix
echo "Finish removing virtualenv: dbt_env$suffix"

if [ $exit_code -ne 0 ]; then
    echo "exit code $exit_code"
fi

# remove local temp folder
echo "Start removing local dbt folder /tmp/$suffix"
rm -rf /tmp/$suffix
echo "Finish removing local dbt folder /tmp/$suffix"

# rethrowing the exit code to BashOperator
exit $exit_code

And here’s the Makefile:

.ONESHELL: 
# special target: .ONESHELL instructs make to invoke a single instance 
# of the shell and provide it with the entire recipe, 
# regardless of how many lines it contains.

# the /home/airflow/gcs/data/dbt-projects/$repo_name will have the models,
# and appropriate dbt files for a particular dbt job object.
# cd $repo_name

debug:
	cd /tmp/$(SUFFIX)/$(REPO_NAME)
	dbt debug --profiles-dir=./profiles_dir

debug-local:
	dbt debug --profiles-dir=$(local_path)/profiles_dir

run:
	cd /tmp/$(SUFFIX)/$(REPO_NAME)
	dbt run --target=$(DBT_TARGET) --models=$(DBT_MODELS) --vars '$(DBT_VARS)' --profiles-dir=./profiles_dir

run-local:
	dbt debug --profiles-dir=$(local_path)/profiles_dir

run-full-refresh:
	cd /tmp/$(SUFFIX)/$(REPO_NAME)
	dbt run --target=$(DBT_TARGET) --models=$(DBT_MODELS) --vars '$(DBT_VARS)' --profiles-dir=./profiles_dir --full-refresh

test:
	echo "$(DBT_TARGET)"

We also use the KubernetesPodOperator in our composer environment (just make sure whether Workload Identity will work in your environment).
Here’s the discourse article with the explanation: Running dbt in Composer using a KubernetesPodOperator

Hi meurant,

Thanks for the detailed response. I guess the the approach is isolating the environment to run dbt and avoid any conflicts between composer and dbt, whether using virtual environment or Docker with KPO.

As you mentioned that the virtual environment is not recommended approach, I suppose the KPO is the way forward. I did read the article on Kubernetes Pod Operator, and is well written to run dbt in Docker/KPO. Will try this approach and check how this works.

Regards
Pete

Hi there,
I forgot to add that we are looking into using Cloud Run to execute dbt jobs.
I’ll add more info when I can.
Cheers,
Meurant

1 Like

Hi ! Thank for all your time and your work. I read your article about Running dbt in Composer using a KubernetesPodOperator and this will help me a lot. So I was wondering which part Cloud Run would replace in your previous article. (I also have another question about managing your docker images in CI/CD but I don’t want to take up your time.)

Thank you for your time and have good day,

Best regards,
Emmanuel.