My company are looking to deploy dbt - currently just the CLI. I think dbt Cloud looks a lot better, but need some solid evidence and reasoning to convince the business that its the way to go.
Surprisingly, I have not been able to find any articles or discussions comparing the functionality between the two (if anyone has come across one, please share a link). Obviously I know that one is a command line tool and the other has a GUI and an IDE etc, but ease of use alone probably won’t be enough to get buy in. Will we be limited in what we can deploy using dbt CLI? What is the impact of going with one or the other when it comes to running dbt in production?
Has anyone used both and can share their experience? If you favour dbt Cloud, why?
I think this is a great idea for a post, with some considerations though.
You have to formulate your question again. The decision is about deploying dbt with or without dbt cloud.
dbt CLI is just another (awesome) feature; even when you’re using dbt cloud, that doesn’t prevent you from using dbt on your own (either on a local machine, or a different computing instance, like a VM or a cluster), through the CLI.
And yet the question is not easy. There’s many, many considerations around deploying dbt on your own. Just to mention some of them:
the expertise of the team: i) is the team ready to deal with a virtual machine/cluster? ii) is there any person on the team ready to manage this stuff? ii) is your team already familiar to a scheduler (Airflow, Jenkins, Argo, etc), including maintenance tasks? , iii) is your team ready to manage/use one or multiple virtual python environments?, iv) how are you going to deal with documentation? It requires some effort to find a way to distribute and manage documentation on your own.
budget: the expected amount to spend on IT resources depends on the stack you choose; be aware of that.
time: the learning curve is a bit steep, using dbt cloud makes things way much easier.
For me, if I have more budget, I will choose dbt Cloud over dbt CLI. Here are my reasons:
dbt Cloud integrates into the analytics workflow better. When using dbt CLI, we’ll need to switch back and forth between the terminal and the user interface of the data warehouse. — This makes it difficult to maintain the analytics engineering practices in the long run. It would end up with everyone’ll just save their queries in the warehouse.
We can schedule the dbt commands such as dbt test or dbt snapshot in dbt Cloud whereas for dbt CLI we’ll need an extra scheduling tool such as Airflow or Jenkins.
For the docs, we’ll need to set up a server to host the dbt docs ourselves if we choose dbt CLI whereas dbt Cloud can do it for us.
I agree with the others, I think the right answer is “Both.”
Unless you have another compelling reason to run an orchestrator (e.g., Dagster, Airflow) and you have multiple dependencies between your dbt project and other data pipelines, dbt Cloud is absolutely the right answer for hosting your prod dbt deployment. I think the “Slim CI” functionality alone is worth the price. And if you just run a single seat for your prod deployment, it’s basically free.
The dbt Cloud IDE is okay for development, but if you have a large project or are making a big change you’ll want to develop locally using the CLI. That way, you can use your favorite text editor – VS Code has some great dbt extensions – and any other tool you want, like a linter (sqlfluff is a great choice) and formatter (I created sqlfmt.com for this purpose). In a few years I imagine the Cloud IDE will be so good (and have hooks into other tools) that there will be no reason to develop locally, but realistically in early 2022 it’s not there yet.
UPDATE: dbt launched some huge usability improvements to the cloud IDE in late 2022, and it’s quite good! I still develop locally (old dog, new tricks), but I think there is a compelling alternative now in the cloud IDE, especially for analysts who don’t have experience installing Python locally.
Thanks for your response. Can you elaborate on the “Slim CI” functionality you mention?
I understand you can connect to a repo on git and push/pull to a branch from both the CLI and the Cloud interface but I’m not very clear on the difference in CI/CD support.
“CI” is “Continuous Integration.” It’s basically the practice of constantly merging in development code and testing the result before deploying to production. Often this takes the form of tests that run in response to a PR being opened in Github.
The “Slim” part means only testing the changes, rather than rebuilding the whole project to test it. If your project takes an hour to build, this is a key feature to enabling CI.