26 Jul 2016 - by 'Maurits van der Schee'
When building high performance OLTP (online transaction processing) systems you need CPU and IOPS and preferably a whole lot of them. Alternatively you can build a distributed system in the cloud. This will not work (as well) and cloud providers will make you change to this losing strategy one or two magnitudes too fast. That is an awful lot of development time wasted at a moment you are still small and can't afford to lose any. In this post I will bust the common myths about the cloud and high performance OLTP systems.
Scaling up means bigger boxes, while scaling out means more boxes to distribute your load over. The argument that scaling out "needs to be done anyway" does not make sense if you are going to scale out several magnitudes too soon. While you are still growing you are benefiting most of flexibility and low cost of software development that scaling up will bring. It allows for a setup with less IT staff for maintenance, less software to enable distribution of your problem and less complexity to deal with the reduced consistency that this distributed system requires. Scaling out for High Availability only makes sense if your infrastructure costs are significant, but not as long as it is still incomparable to the IT salaries you have to pay. Because the difference between N*2 and N+1 is unimportant when N is small or your total infrastructure costs are (relatively) low.
According to who? Your cloud provider? Most cloud providers tell you a SSD does only limited amount of IOPS and even when you pay the highest price possible they trick you into believing that a consistent baseline performance of up to 30 IOPS/GB is good. Even my laptop holds a Samsung 512GB M2 SSD which, according to specifications, does 300K IOPS on 512GB = 585 IOPS/GB (twenty times more). In a server I would get myself a Intel DC P3700 2.0TB that will not be limited to 20k IOPS or 320 MB/sec, like Amazon EBS, but to 460K IOPS or 2800 MB/sec, according to specifications here. This should be the preferred storage for your OLTP database server, especially at sizes up to 2TB and with the availability of a 2.5 inch model. Note that 460k random 4k IOPS are not really comparable to the 20k 16k IOPS Amazon brags about.
You can easily get machines with 4 times a 10-core E7 on a single motherboard like this one from LeaseWeb that has 4(!) E74830v2 CPUs. This means you have 40 real cores or 80 cores including hyper-threading. Some providers have configurations with "overbooked" cores (this is for instance sold as "burst" by Amazon). But even a so called "m4.2xlarge" instance (that has no "burst") on Amazon sports only 8 vCPU's. It is clearly explained on the Amazon site that "Each vCPU is a hyperthread of an Intel Xeon core". Still people get themselves these machines that cost them about 100 dollars a month. This sounds not terrible expensive as ten of these instances should perform as good as one big expensive machine, right?
With CPU and IOPS one and one is not two; it is more like one and a half. This is because you cannot simply put half your database on a different machine and expect your database lookups to be equally fast. If you would find a high performance generic solution for this problem you'd be rich, because there isn't! This often means that you have to give up on consistency to be able to achieve good scalable distribution. I guess only a few people realize that you can also decide to give up on consistency to avoid distribution. Which may seem to lead to the same result, but approaching the problem from another angle makes all the difference. I have for instance seen high traffic websites run on a simple 5 minute TTL round-robin DNS with a redirection service on each of the nodes. All nodes hold all tiers (database, web server and storage) locally and consists of a single machine in a specifically selected (geographically optimal) data center.
People may say: The numbers provided are not that bad. But how about the numbers that are not provided? Why don't they (Amazon for instance) report IOPS measured in the standard block sizes of 4K random read and 4K random write? Where can I find the average and guaranteed read and write latency? You might think that read and write latency of remote storage compared to PCI-e connected local storage would make a real difference, but it seems it doesn't. The PCI-e bus has a guaranteed high throughput, but in my experience the latency seems not to be limited by the network capacity or other shared infrastructure. On the other hand saying that a disk is a "General Purpose SSD" makes me feel it should perform with an equal amount of IOPS/second as the SSD in my laptop, doesn't it? Well it certainly does not perform as well as you can read in this excellent post by Peter Zaitsev.