06 Apr 2019 - by 'Maurits van der Schee'
There are many myths in the software business that have led to wrong best practices. In this post I will address 7 of these best practices and explain on which wrong assumptions they are based. I'm worried about the state of the industry, because I feel these are serious engineering mistakes and nobody is speaking up about them. Here we go:
NodeJS uses event-loops to run servers. These are said to be faster when connections need to talk to each other (avoids IPC) and for I/O intensive tasks. The reason this is true is because you are avoiding the context switching that multi-programming requires (for security, to isolate processes). And as long as you are only using a single core of your machine this non-existing IPC cost is true. But most servers in production actually have multiple processors and those have 10 or more cores each. And even when your processes can be run multiple times and run completely individually, you need to make sure you concurrency is exactly right per core (a difficult load balancing task). You also need to make sure that you are not doing any computational tasks or blocking I/O in your threads or your servers will be very slow (as your latency will spike). The threading model may be less performant in an optimal case, but in all realistic situations it will be faster as it does not require meticulous tuning.
When you order a trip at a travel agency they have to order a seat at an airplane and then order a hotel room and a car for you. If one of these reservations fails you probably want to go somewhere else where they do have all three available for you. This problem is called "transaction support" in databases and it is solved quite elegantly. The goal of transactions is to have high data consistency and no accidental booked - but not canceled - hotels hanging around. Other consistency features that databases provide are foreign key constraints. If you are implementing micro-services with their own databases you have to either drop transaction and foreign key support, which will lead to data inconsistency, or re-implement transaction and foreign key support, a very daunting task. Both are a very bad idea, so you should stick to a single database server when implementing micro services. Everybody who tells you otherwise should be challenged to explain how to implement "two phase locking".
MongoDB is a NoSQL store and it is fast! It is seriously fast as long as all your data fits in RAM. What they don't brag about is that it has low durability guarantees: it only flushes data to disk every 60 seconds. You can also tune a MySQL server to use all the RAM for indexes and data. You can even set the "
innodb_flush_log_at_trx_commit" variable to zero to flush only once per second and avoid a flush at every commit. Suddenly the performance of MySQL is a lot closer to the performance of MongoDB. I wrote an article titled "Trading durability for performance without NoSQL" on how to do this in various databases. Also databases without foreign keys and table structure may seem flexible, but they come at the cost of inconsistent data that piles up in your database. I would rather have less flexibility and more consistent data in my primary store. But if you do not care about the quality of your data, then MongoDB may be great choice.
People use a DBAL (DataBase Abstraction Layer) and/or ORM (Object Relational Mapper) to not having to write SQL or (heaven forbid) stored procedures. I have not seen this work out well for a few reasons. Developers need to be familiar with SQL to be able to write efficient queries in large systems. If you don't know exactly what SQL is executed, because you use an ORM, you can easily make a (performance) mistake. I have also seen many very expensive algorithms that could have been replaced with a rather cheap and simple stored procedure. But apparently stored procedures are not "cool" anymore and reasons given are the database independence and that supposedly code does not belong in the database. This last thing may be somewhat true, but with some proper versioning you can achieve a lot. On the database independence I can say that the independence is almost never achieved and you are constantly paying the price, so this is really a case of YAGNI (You Aint Gonna Need It).
Unless you run a shared hosting shop at a competitive price level you are either a) running tasks that are larger than one machine or b) in need of better isolation than containers can offer you. If your whole application fits in a single (virtual) machine, then why don't you serve it on a single machine (the costs can hardly be the problem)? If it is larger than a single machine then you don't need containers, you just need more machines. Of you need proper isolation or reproduction of a test environment then why not use virtual machines? They isolate better at only slightly higher costs. Also they can run correct kernel versions for your test environment and have all the required services running in the operating system. I see a lot people use container technology for it's cluster orchestration. This is complete non-sense and has nothing to do with containers. We already had perfectly fine orchestration tools.
So you have scaling ambitions with your web product? Great. Big web services like Wikipedia are renting racks (with bought or rented hardware), not VMs and managed databases. So are other large websites and for two good reasons: dependency and money. Depending on virtualized hardware provider is setting yourself up for failure when you scale. It reduces your ability to benefit early on from "economy of scale" and hurts your growth. Amazon will hardly give you any discount, even when you do considerate revenue on their platform (I've seen 50k/month go at list price). A lot of knowledge, processes and people are bound to the virtualized technology you buy from your vendor. Hardware from these vendors is not unlimited either, that is a myth. Read about "soft limits" and how you need to request upgrades of these limits by email (they will add some hardware for you). It will be costly to switch away from your vendor and to learn the tech needed to run your own services. I often speak with engineers that think that 1k IOPS as Amazon provides on their "SSD" storage is a normal amount, while a consumer NVMe disk does 300k.
Think about the above technology choices and experiment. Dare to challenge the fundamentals and explore unpopular opinions. Worst thing that could happen is that you become a better web developer.