Active cache invalidation is an anti-pattern

29 Jul 2016 - by 'Maurits van der Schee'

I strongly believe that active cache invalidation is an anti-pattern, a symptom of bad software architecture. In this post I will sum up the reasons people give for explicit cache purging and explain what's wrong with them. It is not wrong to allow purging of the cache manually and by exception, but it is wrong when cache invalidating is part of the algorithm. The rule of thumb is: your software should not delete from the cache during normal operation.

I want my cache to be fully consistent

You may, for instance, store a record both in MySQL and in Memcache. When reading you may only need to read from Memcache and not from MySQL. This seems a good idea as this is both consistent (using a relational database) and high performance (using an in-memory key-value store). In fact this is the inner-platform effect at play, because MySQL already has an (in memory) query cache that is optimized to be both high performance and consistent. If you feel the MySQL read or write performance is not high enough you should read it's manual and find out which settings to tune to trade memory use or consistency for more performance. As a start you may look at "innodb_buffer_pool_size" and "innodb_flush_log_at_trx_commit" to improve read and write performance.

I sometimes delete to be more consistent

If you know for sure that the cache is no longer valid you may as well delete it's value right? Although this for sure reduces the average age of the cache, you can also achieve a lower average age by a lowering the expiration time on the cached data. The advantage is that a lower expiration time guarantees a lower (maximum) age of the cache entry, while deleting in certain cases, does lower the average age, but does not lower the the guaranteed (maximum) age. The bottom line is that you should set your cache expiration high enough so that the performance improves, but not so high that you are serving data that is older than the maximum acceptable age.

I delete to control when my cache will expire

This may be far fetched, but one could clear the cache before a peak load in order to ensure that a heavy used cache key does not expire during peak load. This sounds like a good idea as an expiring cache key can have an escalating effect called a "cache stampede" (also known as "thundering herd" or "dog piling"). The correct solution to this problem is not to delete the cache entries, but to serve "stale" (expired) entries while recalculating. Key to this algorithm is that the first cache miss signals other processes that it will recalculate the cache value and that other for now can continue to serve the expired entry. This algorithm is implemented in TqdMemcacheBundle in the functions "getAdp" and "setAdp".

Performance and consistency

Battle tested software is often properly optimized, hence it does not make sense to try to be "smart". It will most probably lead to non-optimal duplicated functionality of your data storage. Whenever you seem to achieve better performance you are in fact trading consistency for performance. Most advanced data stores have built-in functionality that allows you to configure this trade-off. But configuring this behaviour does require you to read the manual, something most engineers are not fond of.

PS: Liked this article? Please share it on Facebook, Twitter or LinkedIn.

TQ
dev.com