# Consistent Hashing - Theory & Implementation

### Consistent Hashing - Theory & Implementation

**What's it?Theconsistent hashing comes from the solving of hot spot problem inInternet system, I.E., it comes from the distributed cache system.[1][2]The idea is simple and straight forward (more detail in paper[1][2]):**

**Hot Spot -> Centric Cache -> Distributed Cache (Communicate using IP Multicast) -> Harvest Cache(Tree Structure Cache, structured communication)[9] -> Random Tree Cache(different Tree for different cached Objects, hash mapping from tree node to machine node) -> Random Tree + Consistent Hashing Cache(deal with machine node dynamics: machine may leave/down and join/recover)**

**Essentially, a Consistent Hash Functionis one that changes minimally as the range of the function changes. Ahash function can be looked as a mapping from items to buckets, supposewe have such a mapping m1. If some buckets are added/removed, we haveanother mapping m2. In normal hash functions, m1 -> m2 willintroduce many item movements (from one bucket to another). Aconsistent hash function has the characteristic that item movements areminimal.How does it accomplish that?In consistent hash function:1. Items and buckets are all hashed(normal hash) into some integer interval (for example: [0, 1024]).2. Item is mapped to a bucket that is closet to it.3. Bucket may be replicated to improve even distribution balance.NOTE: here "closet" means the first bucket it meets when traverse clock wise along the integer circle (see diagram below)Supposethe normal hash output range is [0, 1023], you have 4 buckets and 8items, each bucket is replicated twice. One possible consistent hashingmay be illustrated as below:**

**Current Mapping/Hashing:Bucket1 - Item1, Item3, Item8Bucket2 - Item2, Item7Bucket3 - Item4, Item6Bucket4 - Item5**

**If Bucket3 is down/broken, the new Mapping/Hashing will be:**

**Bucket1 - Item1, Item3, Item8Bucket2 - Item2, Item6, Item7Bucket4 - Item4, Item5**

**You can see that only Items on Bucket3 are changed, and they are distributed among the remaining buckets.Ifa new Bucket5 is added, you can see that only small number of items arechanged, and the new bucket gets load from the original 4 Buckets.How to Implement it?Normalhash function is stateless, the return value only determines by theinput parameter, but the consistent hash function is a statefulfunction. The state is how the buckets are arranged on the integercircle.It's natural to store how the buckets are arranged onthe integer circle as a search tree, since it's in fact a typicalsearch algorithm - you need to know which segment an item (its hashvalue) belongs to.In practical system, this stateful datastructure will be stored on each client that uses this consistent hashfunction. As buckets(machine nodes) join and leave, the state willchange. But different client may see the join/leave at different time,or even in different order, thus will produce different hash valueusing the same consistent hash function(It's said to have differentview in paper[1]).But it is proven in [1], that the number ofbuckets that one item may belong and the number of items that on oneparticular bucket won't be very large (the so called Spread/Load property).A C++ version of consistent hashing function can be found here, it uses STL map as the binary search tree.The impact of the bucket replica number can be visualized as below (code can be found in testmain.cxx):**

You can see that as the replica count increases, the item distribution over buckets will become more and more even.

**[Reference]1. Theory Paper Consistent hashing and random trees**

**: distributed caching protocols for relieving hot spots on the World Wide Web**

2. Practical Paper Web Caching with Consistent Hashing

3. Blog About Consistent Hashing with Java Code

4. Blog Understanding Consistent Hash

5. http://en.wikipedia.org/wiki/Consistent_hashing

6. http://en.wikipedia.org/wiki/Distributed_hash_table

7. The Chord P2P system

8. A Hierarchical Internet Object Cache