I am interested in analyzing complex dbt pipelines and integrate them with our in-house orchestrator. In order to do this, I noticed that dbt already builds a manifest.json file that summarizes the whole pipeline as an execution plan. This seems like a good way of integrating a dbt-built pipeline with other tools where running dbt run is not an option. If I can confirm that dbt doesn’t do any additional operations at runtime, and whatever is built after dbt compile is what’ll be executed during a run I can change the runner to be sth other than dbt.
My question is: does dbt do any additional calculations / materialization / running macros after the manifest file is built?
yes, it runs lot of macros but the below are some examples
it runs below macros for incremental materialization with dbt-snowflake adaptor
ex:- dbt_snowflake_validate_get_incremental_strategy , dbt_snowflake_get_incremental_sql
Because model SQL may be dynamically templated based on the results of a previous model, there’s no way to pre-compile all the SQL and ship it off for execution elsewhere—dbt needs to be involved from beginning to end.
You mentioned you’re using an in-house orchestrator, so I’m not sure what a solution might look like for you. Perhaps if your orchestrator supports running a docker image, you could package your dbt project into an image (along with the dbt-core package), and build your models that way.
Understood, thanks a lot for the detailed answer. One of the options is to package it like you said and run the whole dbt pipeline there, but ideally I’d like to be able to mix and match things so that another asset can depend on an intermediate model dbt produces, etc.
Is there an example where I can see such dynamic dependencies?
The most common approach I’ve seen to solve for that kind of “mix and matching” is to split the dbt run into separate layers. Eg if you have three layers base, intermediate, and marts you might do three separate dbt jobs:
dbt run -s tag:base
dbt run -s tag:intermediate
dbt run -s tag:marts
Depending on your orchestrator you may also be able to use “sensors” similar to how the Sensors work in Airflow. If that’s possible you could create a sensor that waits for a specific dbt model (ie table) to be updated.