Dagster vs. Prefect

yusamidas · November 23, 2020, 2:18pm

We’ve been using dbt for a quite a while now and loving it! However as great as it is for working inside of the data warehouse, there’s still a lot stuff we need to do before the data gets into the data warehouse and into domain of dbt.

We’ve been benchmarking the data orchestration tools, and we’re considering implementing either Dagster or Prefect. Both of them seem really great and hugely popular inside the scene. And now both of them support dbt as well.

My initial thoughts:

They both seem to have the same standard functionality and great code usability. They work very similarly. However Dagster has a bit more versatility with integrations (jupyter/papermill is appreciated)
Dagster seems to have better UI and tools for debugging data pipelines locally. This is hugely beneficial as data pipelines grow more complex.
Prefect has better cloud operations and less maintenance with native Prefect Cloud service, which is appreciated. We’re happy to pay some premium for less work in maintenance.

Does anybody have any hands-on experience and could give some thoughts? Or any direct recommendations? Or should we consider something else entirely?

robmarkcole · November 25, 2020, 3:26pm

Also worth considering Airflow, which has hosted versions on astronomer, GCP & AWS

kning · November 27, 2020, 6:07pm

Some of my teammates are pretty bullish about Argo if you’re on Kubernetes.

max-sixty · November 28, 2020, 3:36am

How finely grained are your pipelines? Are you running on K8s?

Argo works really well if they’re coarsely grained. Often that’ll be the case with dbt pipelines, given that dbt does much of the aggregation.

acacia · November 28, 2020, 1:21pm

@robmarkcole - I can’t find any information on hosted Airflow on AWS. Can you send a link?

robmarkcole · November 28, 2020, 2:17pm

acacia · November 28, 2020, 6:23pm

Thank you. That’s hot of the press!

assaf · November 29, 2020, 3:23pm

I’ve been at the same crossroads just weeks ago, and your thoughts are spot on. It was a tough call, but we ultimately went with Dagster, mostly due to superior tooling (Dagit) + flexible programming model + community. I honestly thing you’ll make a good decision either way, but for us it just seemed like Dagster “thinks” holistically about the process and challenges of making data applications, whereas Prefect solves for developing and executing pipelines in a very ergonomic way, but it’s not as complete.

I saw this post from Nick Schrock on the Dagster slack community, that I think gets at the core of the difference between them:

Dagster pipelines are more structured and constrained. This allows us have a lot of additional features (a type system, a config management system, a system-managed context object that flows through the compute, among other things). By constrast Prefect pipelines are more minimal and dynamic.
Another way of framing this difference is that dagster is very interested in what the computations are doing rather than only how they are doing it. We consider ourselves the application layer for data applications (rich metadata, type system, structured events with semantic meaning etc), whereas prefect frames their software in terms of “negative” and “positive” engineering. This negative/positive framing is more exclusively about the “how” of pipelines: retries, operational matters etc.

idomi · February 23, 2022, 1:20pm

It didn’t come up here but Ploomber (GitHub - ploomber/ploomber: The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️) is also a major tool in this space. It has seamless integration with Airflow, Kubeflow, Argo etc, so you can deal only with the core coding part. It’s well integrated with Jupyter and papermill so you can stay in the interactive environment. We recently added monitoring and alerting for pipelines and an easy way to deploy your experiments to the cloud.

On top of that, it has seamless transition to production since some of the work behind the scene is being analyzed and cleaned into .py files.

Topic		Replies	Views
Orchestrating dbt and pyspark Archive	0	1692	July 10, 2022
Lf Data engineer for a small project Help	0	317	March 28, 2024
Orchestrating Fivetran and dbt with Airflow Show and Tell airflow , dbt-cloud	3	13615	December 14, 2021
Faster data modeling with Looker and dbt (Webinar Q&A with Drizly) Archive	0	3419	October 5, 2020
DBT Orchestration: DBT Core with Airflow Show and Tell best-practice , airflow , orchestration-and-deployment , devblog , dbt-core	0	28	July 10, 2025

Dagster vs. Prefect

Related topics