Implementing cache invalidation is wrong

18 Apr 2017 - by 'Maurits van der Schee'

There, I've said it! Again! It is my firm belief that it is. Instead of arguing why this is true I will try to negate the argument I hear most often from people arguing otherwise. In this post I am talking about a primary (data) store and a cache. It may help to think about a cache as a Redis or Memcache instance used by a web server and about the primary data store as a relational database server (MariaDB for instance).

By removing or updating the affected cache records on an update to the primary store we can ensure that the data that is in the cache is always correct.

Transactions matter

This is only true under the condition that you can detect the failure of a cache update and that you are willing to fail the transaction that writes to the primary store when the cache update or removal fails. If your cache update may silently fail, then you cannot ensure that the cache is correct. If your cache update is not part of the transaction of the primary store then even a non-silent failure will break the consistency of the cached data. Typically the query cache of a database system and the memory mapper of a disk driver are implemented in as a transactional cache and are thus always correct (but never complete, I'll get to that later).

Re-inventing the wheel

A transactional cache must be tightly bound to the implementation of the primary store in order to be efficient. Your database for instance already has a local transactional memory cache in the form of query cache. If you want to trade write for read performance you may distribute this cache using a transactional replication method. Bottom line is that if you are building a cache system for your database you are most likely re-implementing an already existing and proven system. So either you are doing something that has already been done, or your implementation is not transactional and thus wrong/unreliable.

Embrace inconsistency

I feel that it is better to accept that the data in the cache is not consistent with the data in the primary store and talk about a data contract or promise. Not all your data fits in your cache. Your cache must be faster and smaller than your primary data store. If it was big enough you would make it your primary store. If it isn't faster, then you should not use it. Therefor we cannot rely on the data being available in the cache. When cannot ensure all data in the cache is correct either, because then we would have to verify this in the primary store, which makes the cache lookup slower than the primary store lookup (as it is doing both).

Expiration and eviction

There are two methods to remove data from a cache: eviction and expiration. When the cache is full "eviction" needs to happen. The least important data is removed to make room for more important data. Often a "least recently used" (LRU) algorithm is used. The other method to remove data from a cache is by expiration. Expiration assigns a time-to-live (TTL) to each of the cache records and when the time-to-live is reached the data will be removed.

Hits and misses

Removal of data is not a problem, as it causes the data to be "refreshed". When data is requested from a cache we talk about a "hit" or a "miss". On a miss "fresh" data is retrieved from the primary store and also stored in the cache to enable hits on subsequent requests. In case of a hit the data from the cache is served, without checking it's "freshness" in the primary store. Data retrieval is sometimes slow and accurate, but often fast and the data is never older than the TTL. That is a clear contract that we can build our systems around.

My advice: keep it simple, don't implement invalidation!

PS: Liked this article? Please share it on Facebook, Twitter or LinkedIn.

TQ
dev.com