Cache is considered a great mechanism to increase performance in system design. However, it often comes with many critical problems such as inconsistency, the validity of data, cache delay... To make things worse, workaround to such problems often leads to other bigger problems.

Let's consider the ways to deal with Cache in this post.

Terminology

  • get(k): client requests value of key k in cache
  • set(k,v): client updates value of key k in cache with value v
  • del(k): client invalidate key k in cache
  • query(k)<-v: client gets value v of key k by querying database
  • update(k,v): client update value of key k in database with value v

Look-Aside pattern

Let's consider the most popular cache pattern: Look-Aside pattern

  • Read path
    • Client get(k)
    • If cache presents, return value
    • If cache misses, Client query(k)<-v and set(k, v)
  • Write path: there are 2 ways:
    • first way:
      • Client update(k,v)
      • Client set(k,v)
    • second way:
      • Client update(k,v)
      • Client del(k)

The second way is often preferred because del(k) is idempotent.

Look-Aside pattern problems

Stale Set
  • A stale set occurs when a client sets a value in cache that does not reflect the latest value that should be cached.
  • Example:
    • Client C1 get(k), k doesn't exist (cache miss)
    • Client C1 query(k) <- 1. C1 is somehow slow at this point
    • Client C2 update(k,2)
    • Client C2 del(k)
    • Client C1 set(k,1)

Now in DB k = 2, in cache k = 1.
The root cause is that 2 following operation is not atomic:

  • Client C1 query(k) <- 1
  • Client C1 set(k,1)

So, an update(k,2) can happen in between => dirty cache.

Thundering herd
  • A thundering herd happens when a specific key undergoes heavy read and write activity.
  • Example:
    • Key k is very popular, lots of clients get(k)
    • C1 get(k), cache returns v1 very fast
    • C2 update k, causing it to be invalidated in cache
    • Now, every other client that tries to get(k) will miss, so they will try query(k), which leads to Database being overloaded with too many simultaneous requests.
Leases - solutions for Slate set and Thundering herd

This mechanism was first published by a paper from Facebook in 2013. It relies on 2 cache operations:

  • setnx(k,v): set value v for key k if and only if k was not already been set.
  • check_and_set(k,e,v): set value v for key k if and only if current value of k is e.

Read path:

  1. Client get(k)
  2. If cache misses:
    • generate a random lease l, try setnx(k,l)
      • if setnx(k,l) return false (key k already present), go to step 1
    • query(k)<-v
    • check_and_set(k,l,v), return v
  3. If cache presents:
    • is retrieved value a lease?
      • no: return value
      • yes: wait for a while, go to step 1.

Write path: keep the same as before.

With this new pattern using Leases, we can prevent

  • Stale set: when C2 updates k, it will clear k in cache (del(k)), so later, C1 will fail to call check_and_set(k, lease, v)
  • Thundering herd: in case of a cache miss, only one lucky client will successfully acquire the lease and call setnx(k, lease) and query the database. Other clients will poll the lease and have to wait and try again later until the lucky client populates the cache with the queried value.