# Understanding The Memcached Source Code - LRU II

slab allocator (I, II, III) is the core module of the cache system, which largely determines how efficient the bottleneck resource, memory, can be utilized. The other 3 parts, namely,

LRU algorithm (I , II - this article , III) for entry expiration; and an

event driven model (I , II , III) based on libevent; and

consistent hashing for data distribution,

are built around it.

More often than not, the LRU algorithm is combined with a hash map, and is referred to as a LRU cache.

In a LRU-cache, the hash map enables fast accessing of cached objects; and LRU avoids the cache to grow infinitely by marking expired, or so called, least recently used objects. This time we examine the memcached‘s implementation of hash map.

# Overview (textbook overlapped, skip)

Hash map is basically a fixed-sized array that indexes values with integers hashed from keys. In hash map an array entry is referred to as a bucket to distinguish from the entry of the hash map, i.e., item. If the hash value exceeds the number of buckets (i.e., array size), it rolls over using ‘mod’ (%). Collision occurs when more than two keys result in the same hash value or different hash values roll over to the same bucket, then a *linked list is formed on the bucket in collision.

Collision slows down lookups speed for the sequential access of linked list, hence it is required to increase the bucket number, and to rehash entries using the new bucket number before the performance goes too bad. This process will be discussed soon.

# Module initialization

The first relevant method is

## hash_init

which simply determines the hash algorithm type.

hash_init@hash.c

This method is called from here as one of the init steps before the logic enters the main event loop.

main@memcached.c:5849

The parameter hash_type is set to MURMUR3_HASH by the mentioned command-line argument modern.

main@memcached.c:5849

The second method

## assoc_init

allocates the fixed sized array mentioned in the beginning.

hash_init@hash.c

This method is called in a similar location as hash_init as part of the system bootstrap process.

main@memcached.c:5976

And the actual initial size is determined by the command-line argument hashpower.

main@memcached.c:5677

As said before, the array can be replaced with a newly allocated larger one if the performance drops due to excessive collision. Next we discuss the process of

# Scale up & entry migration

In memcached, the threshold is 1.5, meaning, if the items number exceeds 1.5 * bucket number, the mentioned expanding process starts.

assoc_insert@assoc.c:173

The assoc_start_expand simply set a flag (i.e., do_run_maintenance_thread), and send a signal to awake a maintenance thread that does the actual job.

assoc_insert@assoc.c:173

0) This is where the thread wakes up from sleep and start working, and goes to sleep when there is nothing to be done.

1) assoc_expand allocates the resource for the new hash map which is meant to replace the old one initialized from here.

assoc_expand@assoc.c

2) Only migrate a certain number (hash_bulk_move) of buckets in one batch. When stop_assoc_maintenance_thread is called, the thread stops whenever one batch is complete, rather than hangs around to complete the migration of the whole hash map.

assoc.c:207

3) (The “item lock” actually works on the whole bucket hence bucket lock could be a better name) Use item_trylock (i.e., pthread_mutex_trylock) to access the bucket; 3.1) sleep for 10 sec when the the bucket is not available; and 3.2) release the lock using item_trylock_unlock when the migration (of this bucket) completes.

4) Rehash all the items in the bucket, and migrate them to the new hash map.

5) Move on to the next bucket.

6) Last bucket reached -> go to 0).

Similar to initialization methods, it is called during system bootstrap.

main@memcached.c:5992

stop_assoc_maintenance_thread is called in system shutdown process. Semantically it is opposite to start_assoc_maintenance_thread, while, mechanically, the operations of this method are opposite to that of assoc_start_expand.

main@memcached.c:6098

As said before, the expanding & migration process discussed here has an impact on the logic of all hash map related operations. In the next section, we look at these operations.

# CRUD

## assoc_insert

assoc_insert@assoc.c

1) If expanding process is undergoing and the hash key associated bucket has not been migrated, insert the item to old_hashtable. Note that here we use the old bucket number (i.e., hashmask(hashpower - 1))) to calculate the hash index.

2) Otherwise, insert the item to primary_hashtable directly.

3) Increase the global variable hash_items (number of items). If it exceeds the threshold after the item is added, start expanding & migration process that is the main focus of the last section.

assoc.c:51

## assoc_find

assoc_find@assoc.c

1) Similar to that of assoc_insert, this step locates the bucket from old_hashtable when the key is yet to be rehashed.

2) Use primary_hashtable directly otherwise.

3) Go through the linked list and compare the key (instead of the hash index) directly to lookup the item in the case of Collision.

## assoc_delete

Based on last post, the only delete related method that deserves of re-discussion is _hashitem_before when considering expanding and migration. Similar to assoc_find, it simply reads from the old_hashtable when the key is yet to be rehashed.

_hashitem_before@assoc.c

One thing worth noting is that assoc_find is very similar to _hashitem_before. The difference here is, _hashitem_before returns the address of the next member of the element before the found one (pos = &(*pos)->h_next;), which is required when removing entries from a singly linked list; whilst this method returns the element found directly (ret = it;).

That's it. Did I make a serious mistake? or miss out on anything important? Or you simply like the read. Link me on -- I'd be chuffed to hear your feedback.