28 Jul 2019 - by 'Maurits van der Schee'
When you are building a SaaS software product you need to chose a way to store the data of your customers. You can store everything in one database or you can create multiple databases. There are roughly four main approaches. This article discusses the up- and downsides of these approaches.
First let's try to summarize the four identified approaches:
Now let's try to find some up- and downsides for each of them.
When you store all data in one database and do not take any special measures, then finding out to which customer a "transaction" belongs may take you several joins: ledger, general ledger, book-year, administration/company, customer. When a customer complains that the database is slow and you want to migrate a customer's data to another database server you may not be able to swiftly respond, even when the full process is automated (as it should).
So when is this approach a good idea? When you have mainly shared data and/or the customer to which the data belongs changes often. I think this is quite rare in software products.
When all tables hold the tenancy field (CustomerID in our case), then dumping, copying or removing data from a customer becomes trivial. If you consistently name your tenancy field (e.g. there is a CustomerID in all tables) it becomes really easy to automate customer migration (to a quieter database server) and customer removal. On the other hand, you do have to write these tools yourself
I would say that this approach is a good idea if you have a really high amount of customers that all pay almost nothing. The overhead of administrating a separate databases for each of them, and keeping their structures in sync, may be higher than the benefits.
This is a compromise I often see and is often a good approach for software products (especially when the shared database is read-only and has replicas). Automating account creation and deletion becomes a bit harder, as you need to create or delete a database for every new customer, but when performance problems arise you can use off-the-shelf tools to migrate the database to a quieter server. The shared replicated database can also be used to find the customer's currently active database server.
People joke that every IT problem can be solved using DNS, well.. so can this one. You can have your software product run on customer specific subdomains and route that subdomain to an isolated application/database server. This high level of isolation protects you against system wide failure and it also guarantees that there are no "noisy neighbors". On the other hand you need to ensure that the shared data is correctly replicated.
This approach is generally a good idea when customers pay a considerate amount and you want to guarantee quality of service for some (or all) of them. It also works best when the shared data does not change too much or when you want to run different versions of your product for different customers. This approach also allows you to influence geographic separation and different security levels/measures. I feel this is the most professional approach, but I feel it is also the most expensive one.
I have seen all the above approaches in production settings. No matter which approach you take you will need to automate everything (things such as account creation and migration). I highly recommend approach 4, if you can afford it. On the other hand approach 3 may be a lot better when the shared data is written more than it is read. Approach 2 is the approach I would recommend for a (young) free(mium) web product as it leads to minimal infrastructure costs. If you have, or are thinking about, approach 1 then you might very well be on the wrong path. I have seen young fast growing software products fail spectacularly using that approach.