I have a problem running my DAG.
I’m using dbt version 1.4.5 against a Postgres CloudSQL instance
The problem I’m having
I have 2 models in my DAG, which run sequentially, after which other dependent models run.
So I have:
Model A → Model B → other models. Model B builds on Model A joining it together with other models.
If I run Model A in a separate command and then run the rest in a subsequent command, the chain runs just fine. So if I do this:
dbt run -s model_A dbt run -s model_B+
everything runs fine. If I try to do the same in one command just like this:
dbt run -s model_A+
Model B just takes ages to run while in the first example, it takes 20 seconds. So model A still gets build but afterwards model B takes a very long time to run.
The context of why I’m trying to do this
Before I did not have problems with both models. I’ve added a column to model A and a derived column using a coalesce function of this newly added column in model B. However, I don’t understand how adding one column is having such a big impact on the performance. I did not add a join or anything of a kind.
What I’ve already tried
Tried running the models separately as explained above and this does not give the performance issue.
I checked the logs but can’t find why it would take so long.
If I run the compiled code of Model B directly against the database using Explain Analyze, it gives an estimate of 20 seconds which is in line with the past performance and with the performance of starting the DAG run at Model B.
I’m a bit clueless on how to debug this any further.
Thanks in advance!