# Understanding The Memcached Source Code - LRU II

slab allocator (I, II, III) is the core module of the cache system, which largely determines how efficient the bottleneck resource, memory, can be utilized. The other 3 parts, namely,

LRU algorithm (I , II - this article , III) for entry expiration; and an

event driven model (not complete) based on libevent; and the

consistent harsh (not complete) for data distribution,

are built around it.

More often than not, the LRU algorithm is combined with a hash map, and is referred to as a LRU cache.

In a LRU-cache, the hash map enables fast accessing of cached objects; and LRU avoids the cache to grow infinitely by marking expired, or so called, least recently used objects. This time we examine the memcached‘s implementation of hash map.

# Overview (textbook overlapped, skip)

Hash map is basically a fixed-sized array that indexes values with integers hashed from keys. In hash map an array entry is referred to as a bucket. If the hash value exceeds the number of buckets (i.e., array size), it rolls over using ‘mod’ (%). Collision occurs when more than two keys result in the same hash value or different hash values roll over to the same bucket, then a *linked list is formed on the bucket in collision.

Collision slows down lookups speed for the sequential access of linked list, hence it is required to increase the bucket number, and to rehash entries using the new bucket number before the performance goes too bad. This process will be discussed soon.

# Module initialization

The first relevant method is

## hash_init

which simply determines the hash algorithm type.

This method is called from here as one of the init steps before the logic enters the main event loop.

The parameter hash_type is set to MURMUR3_HASH by the mentioned command-line argument modern.

The second method

## assoc_init

allocates the fixed sized array mentioned in the beginning.

This method is called in a similar location as hash_init as part of the system bootstrap process.

And the actual initial size is determined by the command-line argument hashpower.

As said before, the array can be replaced with a newly allocated larger one if the performance drops due to excessive collision. Next we discuss the process of

# Scale up & entry migration

In memcached, the threshold is 1.5, meaning, if the items number exceeds 1.5 * bucket number, the mentioned expanding process starts.

The assoc_start_expand simply set a flag (i.e., do_run_maintenance_thread), and send a signal to awake a maintenance thread that does the actual job.

0) This is where the thread waits up from sleep and start working, and goes to sleep when there is nothing to be done.

1) assoc_expand allocates the resource for the new hash map which is meant to replace the old one initialized from here.

2) Only migrate a certain number of items in one batch. hash_bulk_move avoids the thread hanging around too long when stop_assoc_maintenance_thread is called. In contrast to the discussed assoc_start_expand, stop_assoc_maintenance_thread reset the flag do_run_maintenance_thread and send the signal to wake up the thread to exit.

3) (The “item lock” actually works on the whole bucket hence I will call it bucket lock instead) Use low priority item_trylock (i.e., pthread_mutex_trylock) to access the bucket lock; 3.1) sleep for 10 sec when the the item is not available; and 3.2) release the lock using item_trylock_unlock when the migration (of this bucket) completes.

4) Rehash all the items in the bucket, and migrate them to the new hash map.

5) Move on to the next bucket.

6) Last bucket reached -> go to 0)

Similar to initialization methods, it is called during system bootstrap.

This method is called in system shutdown process, hence it is opposite in logic to start_assoc_maintenance_thread. Nevertheless, the operations of this method are opposite that of assoc_start_expand mechanism wise.

As said before, the expanding & migration process discussed here has an impact on the logic of all hash map related operations. In the next section, we look at these operations.

# CRUD

N.b., assoc_delete has been discussed in the last post; and in a key-value system update and insert are essentially the same, thus, this section will discuss the operations of C (create) and R (read) only.

## assoc_insert

1) If expanding process is undergoing and the hash key associated bucket has not been migrated, insert the item to old_hashtable. Note that here we use the old bucket number (i.e., hashmask(hashpower - 1))) to calculate the hash index.

2) Otherwise, insert the itemto primary_hashtable directly.

3) Increase the global variable hash_items (number of items). If it exceeds the threshold after the item is added, start expanding & migration process. Note that this is also the preamble of the last section.

1) Similar to that of assoc_insert, this step locates the bucket from old_hashtable when the key is yet to be rehashed.
2) Use primary_hashtable directly otherwise.
One thing worth noting is that assoc_find is very similar to _hashitem_before which has been discussed in the last post. The difference here is, _hashitem_before returns the address of the next member of the element before the found one (pos = &(*pos)->h_next;), which is required when removing entries from a singly linked list; whilst this method returns the element found directly (ret = it;).