@matt we have been incrementally switching over from Redshift and have thus far gotten by with a single small warehouse for everything. However, we are basically at our limit there now and will need to break it out soon. So my first-hand experience is so far limited but my understanding of warehouses is you want to consider a few things:
-
The biggest cost with Snowflake is warehouse usage, which is dependent on a combination of the size of the warehouse and how many minutes it runs.
-
The bigger the warehouse, the faster things run. So it’s very possible a big query can be cheaper to run on a large warehouse than a small, because although a large warehouse costs more per minute to run, the query takes less time to run than on a small warehouse.
It’s recommended to configure a warehouse to auto-suspend. This will pause the warehouse after a specified period of inactivity (thus stop the billing).
For us, because of how Stitch loads data it is running a small warehouse almost 24/7, but not using the full capacity of the warehouse. So we have so far been able to get by with our dbt modeling and BI tool on the same warehouse without too much issue (thus minimizing our billing costs). However, we moved over some of our data science jobs and are pretty close to separating things out now.
What’s “best” is likely different for a lot of organizations. For example, if you have some data science models that run once a week that use a lot of compute, but infrequently, you may assign them their own XL warehouse, which you only get billed for when they run. If you have a lot of BI users and want to ensure they never compete with ingestion or data science computing you could give the BI tool it’s own warehouse.
However, if you give Stitch it’s own warehouse, and Segment it’s own warehouse, and Looker it’s own warehouse, and dbt it’s own warehouse, and Airflow it’s own warehouse, and the most any of these ever use is 20% of the compute capacity then you’ve probably over-optimized and are paying more than you need to.
With that in mind, I think warehouse naming conventions are probably dependent on what you are assigning them to. I think what you have makes sense if you just want to make sure BI users aren’t impacted by ETL compute, and you can get by with a single warehouse for ETL compute. Warehouses are also pretty easy to create and delete and reassign so I think the consequences of warehouse naming conventions aren’t the same as with a database.