Rehashing problem

Hash function
  • a function that maps data of arbitrary size to fixed-size values
  • h: U → {0, 1, . . . , m − 1}. example: h(x) = x % m
  • collision:
    k1 != k2 and h(k1) == h(k2)
Rehasing problem
  • Rehashing problem arises when m is not fixed

  • We face this problem in many cases in practice:

    • A load balancer distributes the requests to multiple web servers, which can go down at any time
    • We store data in a cluster of server, when we add/remove a database server, how to move the minimal amount of data between the servers
  • If we use normal hash function like h(x) = x % m, when m = m - 1, we need to rehash almost every keys: h(x) = x % (m-1)

Consistent hash

  • Consistent hash was introduced by Karger et al in 1997 to address the re-hashing problem.
  • Basic idea:
    • each hash value is placed at a random position between [0, 1]
    • each key is placed at a random position between [0, 1]
    • then each key is mapped to the first hash value on its right. If there is no hash value on its right, then store the key in the hash value with the smallest position.
red: hash value, green: key
  • This way, we can prove that if m is increased by 1, the expected number of key need to be re-hashed is n/(m+1). See the proof here
Implementation
  • Setup: route n requests to m servers.

  • Instead of placing m server to a random position in [0,1] we use a hash function f(x) to map each server to a random long number between [0, MAX_LONG)

  • Now for a incoming request r, we need to find the smallest number s that s >= f(r) => in Java we can use TreeMap to quickly perform this finding operator in O(log m)

  • In practice, because m is not big, we make up some virtual servers to reduce n/(m+1).

  • Source code

Applications
  • There are many production system that use Consisten Hash:
    • Couchbase automated data partitioning
    • Partitioning component of Amazon's storage system Dynamo
    • Data partitioning in Apache Cassandra
    • Riak, a distributed key-value database
    • Akamai Content Delivery Network. (One of the Consistent Hash inventors found this company)
    • Discord chat application

Ref