Saas multi-schema best practices

I was hoping to get some thoughts on what you would consider best practices for a typical SaaS situation:

  • We are a B2B SaaS tool
  • We use PostgreSQL as our main operational database
  • Each one of our customers has their own schema in our database (all have the same data structure - we just have separate schemas to prevent leaks in case of bugs)
  • There are many of these schemas (as in thousands)
  • We also have a Redshift with lots of other data from Salesforce, Stripe, etc and currently use DBT + DBT Cloud to model that and prepare for our BI tools

For the sake of the argument, let’s say we have a table “sale” in each of these PostgreSQL operational schemas. For analytics purposes, we’d like to be able to answer questions like how many sales there were in a given day, across all of our customers, i.e. all of the schemas.

How would you setup the data pipeline and model this?

My initial thoughts are:

Option A - we could replicate all of the schemas from Postgres to Redshift using something like FlyData, then use DBT to create a view across all the “sale” tables across all of these schemas in Redshift. Not sure this would even work.

Option B - we use some kind of ETL tool to somehow listen to changes on all the “sale” tables in the all schemas in Posgres, then write a “clean” version into a unified “sale” table in Redshift. Not sure what tools would be good for this either.

Any thoughts and guidance would be appreciated! Thanks.

1 Like

I’ve got a similar (although not exactly the same) situation in both MSSQL and MongoDB (2 different SaaS products). In MSSQL, we’ve got 1 database per “State” and in Mongo, we’ve got 1 db per customer.

Right now, for MSSQL, I’m using Stitch to replicate each table from the source db into its own schema (BigQuery dataset in this case). I then use dbt to loop through a list of these schemas to create a monster view across all the source tables. So if we’ve got 12 databases (again 1 for each state), then I end up with one PurchaseOrders view that essentially unions all the individual db_{{statecode}}.PurchaseOrders tables together. Then I can easily query across all of them as one - or if performance is an issue, I materialize it as a table. With 1000+ of these, this seems like it might hit a wall, but perhaps not?

For Mongo, where my numbers are headed in your direction (we’ve got 100+ dbs today); I’m planning on rolling my own singer tap configs and just add the tenant/customer id as a new column for every collection.

Would definitely be interested in others’ solutions as well.