# Understanding The Memcached Source Code - LRU III

slab allocator (I, II, III) is the core module of the cache system, which largely determines how efficient the bottleneck resource, memory, can be utilized. The other 3 parts, namely,

LRU algorithm (I, II, III - this article) for entry expiration; and an

event driven model (not complete) based on libevent; and the

consistent harsh (not complete) for data distribution,

are built around it.

In previous posts, we have discussed different facets of an item, i.e., slab, hash map and LRU list as well as their associated (CRUD) methods, which build up the internal procedures and perform client requests after the corresponding commands are parsed by the drive machine. This time we will go through those procedures by issuing telnet commands to a Memcached instance and see how the discussed modules work together on various item operations. We will also see the whole picture of LRU lists that maintain the property of ‘least recently used’ in accordance to those operations.

On top of standard LRU algorithm, the Memcached (1.4.28) emploies 3 lists instead of just 1, i.e., hot, warm and cold, a.k.a., Segmented LRU. This heuristic is implemented to reduce the lock contention between item bumping (an action that moves recently accessed item to the list head) and item read. Moreover, unlike a casual implementation (such as the one I coded), Memcached does not bump item right on read action. Rather, the bumping is delayed to other operations when the resource is in short, which could reflect Memcached‘s read-first design decision.

In normal use cases, let’s say, a social media, the volume of read requests are more than that of other operations combined by orders of magnitude, hence it’s a critical point that worth extensive optimizations, I suppose.

We start this post by issuing an item read command to a Memcached instance.

The the normal execution of this procedure,

## process_get_command

The only relevant step here are 1) item_get and 2) item_update. Steps marked as *) are mostly command parsing and I/O which will be discussed in later posts when we examine event driven mechanism.

## do_item_get

Like other methods discussed before, item_get is a thread-safe wrapper of do_item_get.

1) Use the discussed assoc_find to locate the item using the hash key.

2) Increase the discussed reference count.

3) If the item has expired, remove it. Note that do_item_unlink decreases the reference count held by the last step, and do_item_remove actually removes the item. These two methods will be discussed soon in item delete.

4) Simply mark the item as ITEM_ACTIVE rather than perform item bumping which is offloaded to other operations associated procedures. This is part of the heuristic discussed in the beginning.

## do_item_update

item_update is a thread-safe wrapper of do_item_update.

1) The only line effective in this method is to set the access time for the item in (passively) an interval of 60 seconds. lru_maintainer_thread is set to true by command line argument modern so the operations inside if (!settings.lru_maintainer_thread) is not applicable.

Next,

# Delete

Call stack in normal execution,

## process_delete_command

1) Get the item using item_get discussed in last section. Note that the reference count is increased in item_get.

2) Delete it.

item_unlink is a thread safe wrapper of do_item_unlink.

1) As discussed in last post, assoc_delete removes the item from the hash map; and

2) item_unlink_q removes the item from the LRU list that the item belongs to.

3) This time do_item_remove simply decreases the reference count. The item will be removed when do_item_remove is called the second time from

## do_item_remove

item_remove is a thread safe wrapper of do_item_remove.

1) Decrease the reference count, if it reaches 0, goto 2) and free the item.

2) Use ITEM_clsid to get the slab class the item belongs. This macro removes the list type from slabs_clsid.

3) Call slabs_free to release the memory to slab subsystem.

# Create

The Item creating procedure is divided into several logic fragments by the mentioned drive machine, 1) creating an empty item object with the key and other meta data sent through; 2) read the value (from the socket) and fill the item object with it; and 3) link the item object. The workflow controller - drive machine will be discussed in the next post.

Now we send an add command to the Memcached instance.

## Creating an empty item object

After the first line of the above command

the procedure described in this section starts, the call stack of the hot path is,

We start from

### process_update_command

1) Set the key (i.e., test), as well as the meta data (i.e., flags, 0; exptime, 60;vlen,11), to local variables.

2) Increase vlen by 2, to populate the \n\r in addition to the key string.

3) Call item_alloc to allocate the memory (from slab) for the item.

4) After item_alloc is called, set the properties of conn. Here ritem points to the data portion of an item chunk; and rlbytes is set to vlen. These two fields will be used to populate the data portion with the content, i.e., hello world, in the next post.

### do_item_alloc

Unlike other methods we have discussed, item_alloc is a wrapper of do_item_alloc without adding any locks. I would assume this wrapper is added simply for consistent code style.

1) item_make_header initializes suffix portion of the item chunk using the meta data (flags) and key.

2), 3) are discussed in detail in Slab III. To recap, slabs_clsid select the most optimal slab class and slab_alloc allocates one item chunk from slab sub-system.

4) If slab_alloc fails, try to release some memory using lru_pull_tail and retry the allocation for at most 5 times. lru_pull_tail is the focus of the next section.

5) Initialize the pointers of LRU list and hash collision list.

6) Set the list type (HOT_LRU) to the slabs_clsid, which indicates that this item belongs to the “HOT” LRU list of its respective slab class.

7) Initialize other fields of item.

### lru_pull_tail

Method start & end

p) This method starts by selecting the tail element of the designated LRU list using the slab class id and the list type, assuming that the element can be a release candidate. And it iterates over (at most 5 entries) the list in reverse order to find a entry in case that elements near the tail are recently accessed.

s) For each item selected, increase its reference count. In normal situation, the original value of reference count should be 1 (as you will see in the last step of the create operation). Hence a != 2 value after the increment indicates an exception that needs to be corrected. Note that the reference count is now 2 so it is required to decrease at least one time (back to 1) when the processing of the current item is done (e1, e2 or e3 is reached).

e1) Remove the item directly when an expiration is detected. Here the do_item_unlink_nolock is exactly the same as the discussed do_item_unlink (I think the code is duplicated to emphasize that this method is not thread-safe), and it follows the same “unlink and remove” routine as in item delete.

e2) When a candidate is found, we might need to relocate it to another list (when move_to_lru is set in the switch case) by calling item_link_q. And we do need to call do_item_remove to reduce the reference count back to 1. The decision is made by the steps discussed bellow.

e3) If an item is recently accessed, reset the ITEM_ACTIVE flag; bump it to the head of the list; decrease its reference count and iterate to the next one (maximum 5 times). Remember that the flag ITEM_ACTIVE is set by item_get, and here is the place where the item gets bumped.

Hot & warm

1) The only difference of HOT_LRU and WARM_LRU is the threshold (limit) which are indicated by their respective configurations hot_lru_pct and warm_lru_pct.

2) If the threshold of HOT_LRU or WARM_LRU is reached, remove the item from the current list using do_item_unlink_q, and goto e2). Therefore, e2) is responsible to relink it to the COLD_LRU and decrease the reference count.

3) If the current item is not “active”, and the threshold is not reached, finish this method without any item relocation, nor release.

Cold

4) If there are any item in the list, evict it directly with the discussed do_item_unlink_nolock to free up its resource, and goto e1). Note that the default values of evict_to_free and slab_automove are set to values that disable their respective if blocks.

Now we input the second line of the command

and trigger the following procedures.

## Populate the item with content

As mentioned in process_update_command, the content we input is populated to conn.item using conn.ritem and conn.rlbytes. This step is handled by drive machine which will be discussed in detail in the next post.

Now we consider conn.item is filled with all relevant information, hence the next and final step is to

the call stack of this step is

complete_nread checks the protocol in use and moves directly to

1) Call store_item to link the item to the LRU list and hash map.

2) The reference count is set to 1 by do_slab_alloc in do_item_alloc and increased by do_item_link in do_store_item. So reduce it to the normal value, 1, with item_remove. The methods of those have all been discussed in detail.

### do_store_item

1) The newly created item exists only in the slab subsystem, hence do_item_get returns null as there is no such record in hash map yet.
2) So in the context of item creation, do_store_item` is essentially the same as do_item_link.