Introduction

Consensus problem

Given a distributed system such that:

  • the network between nodes is unreliable (1)
  • nodes can crash (2)
  • there is only one leader node that client can communicate with.

We want all the nodes to agree on a single data value x sent by client. Say, client sent value of x to a leader node, and this node replicates this value to others in the system via network package. The prolems are:

  • what if the package is lost (due to (1)) ?
  • what if the target node suddently crash (due to (2)) ?
  • what if leader node crash ?

Real world example can be a distributed, fault tolerant database. For redundancy, we have to store data in multiple nodes.

The motivation behind RAFT

The consensus problem was solved by a family of protocols named Paxos invented by Leslie Lamport. Multiple distributed system implement Paxos:

Chubby is based on Multi-Paxos, Zookeeper is based on Zab (a protocol similar as Paxos). etcd is built on top of Raft

However, Paxos is hard to understand and even harder to implement. Then in 2014, Raft was invented with the main aim of solving consensus problem in a simpler and understandable manner.