Hey all, I have a dbt project that I’d like to integrate with Airflow and was wondering what the thinking is in terms of best practices.
For context, my dbt project lives in Gitlab repo A and Airflow (cloud composer hosted on GCP) is a shared Airflow instance living in Gitlab repo B.
My current dbt setup has a full CICD with merge’s to master deploying models to our production environment. Ideally I’d like to use Airflow to run tests or build incremental models on varying schedules eg. hourly, daily, etc.
I’ve seen a few different potential patterns that exist and was wondering which one is recommended. Potential solutions:
- Create a docker image of the dbt repo on merge to master. Use Airflow’s KubernetesPodOperator with this image and run commands on the pod
- Import my dbt repo as a submodule in the Airflow repo and use the BashOperator
- Deploy my dbt files to a GCS bucket on merge to master, pull in those files in my Airflow DAGs and use the BashOperator