# Understanding The Memcached Source Code - Event Driven I

slab allocator (I, II, III) is the core module of the cache system, which largely determines how efficient the bottleneck resource, memory, can be utilized. The other 3 parts, namely,

LRU algorithm (I , II , III) for entry expiration; and an

event driven model (I - this article , II , III) based on libevent; and the

consistent harsh (not complete) for data distribution,

are built around it.

In classic multithreading, large amounts of slow and blocking operations, mostly, I/O, can easily drain out available thread resources, which severely constrains the maximum number of requests a server can handle per unit time. More specific, threads are scheduled out and put into sleep in the middle of procedures that contain blocking I/O, despite piling up requests packets queuing within the network stack. In such situation, server side will show low throughput, low CPU saturation and high latency.

Here is a post around server side performance. Feel free to check it out.

# An introduction to event driven

This is where asynchronous event driven model comes in, which drops the idea that context of a session must be coupled with a thread. In such model, session contexts are contained within and managed by a drive machine and a thread is fully unleashed with unblocking I/O operations. More specific, 1) when I/O occurs in the middle of a procedure, a thread does not block but instantly switch to the processing of another request; and 2) when the I/O completes, the context will be picked up by the drive machine to resume the interrupted session. As such, a potentially slow procedure is effectively divided into multiple manageable pieces, and the cutting points are marked by I/O operations. This results in more performant single threaded architecture in comparison to those employ thousands of threads.

In my understanding, event driven model, in its essential, is yet another instance of “divide and conquer” and “trade space for time” in a not very obvious way.

On the other hand, multithreading can be still used in event driven model purely for the purpose of parallelism. Thus, in practice, the number of threads employed does not exceed that of CPU cores. I will discuss the Memcached multithreading soon in Thread model.

# The drive machine

From a developer’s point of view, there are numerous ways to program an asynchronous even driven server. Memcached adopts an approach called state machine, in which logic flow is divided into non-linear fragments identified with states, which is normally controled by a huge switch case. The brightside of this approach is that the mentioned breakdown of slow procedure is sincerely reflected by the logic fragments. But it makes the code style a bit different from what most developers are already used to.

Following is how the event driven state machine actually looks like.

The omitted switch blocks will be discussed in detail in following posts, so no worries.

The thread model of Memcached is quite standard. There is a dispatcher thread, and there are preconfigured number of worker threads. Each thread runs an independent drive machine described above. The dispatcher thread, of which the responsible is to distribute requests among worker threads, only executes code under conn_listening. The actual requests are completed by worker threads running on the rest of the states.

Next we go through the bootstrap portion of main function which establishes the various building blocks of event driven as well as the multithreading mechanism. And we will also see locations of the discussed sub-system ***_init methods in relation to the whole initialization process.

First thing first, all the system initialization relevant procedures are executed in the discussed dispatcher thread.

The call stack of this process is

# System initialization

The two relevant steps are 4) and 5) which will be discussed in the following sections.

1) Raise the limit for core dump file size as well as the number of file descriptors.

2) Call event_init to initialize the libevent framework. The value returned is called an event base.

3) For all potential connections, call conn_init to allocate space to store their respective contexts (located using file descriptor in global variable conns). The role of context in event driven model has already been discussed in introduction.

5) Setup the socket and first event listener - conn_listening.

6) Call event_base_loop to start the event loop.

*) Other miscellaneous system operations, such as setting the signal handler for SIGINT and SIGTERM; setbuf stderr to NULL; dropping the root privileges of the process; and daemonizing. If those names do not ring a bell, $\lt$$\ltAdvanced UNIX Programming\gt$$\gt$ is your friend.

The core data structure of multithreading mechanism is

1) Allocate memory for an array of LIBEVENT_THREAD. The number of thread is num_threads Each element represents one thread. As described above, better the num_threads does not exceed the number of cores.

2) Set the event base for the dispatcher_thread which represents the main thread itself. Note that dispatcher_thread is a global variable so the reference is accessible to all the worker threads.

4) Initialize the pipe fds for each of the worker thread. Here the notify_send_fd is used for communication between dispatcher thread and worker threads - whenever the dispatcher thread writes to notify_send_fd, an event is generated on the other side, notify_receive_fd, which is listened by worker threads. Again, $\lt$$\ltAdvanced UNIX Programming\gt$$\gt$ gives more information about pipe.

5) The full method name is supposed to be setup_libevent_for_each_thread. Will examine this method in the next section.

6) Call pthread_create to create the actual worker threads. Will examine this method in create_worker.

1) Call event_init to initialize the libevent instance for the worker thread. As discussed in thread model, each worker thread runs its own drive machine.

2) Set the thread_libevent_process as the callback of events emitted from the discussed notify_receive_fd. The major function of thread_libevent_process is to link the actual drive machine to events, which we will see very soon in inter-thread communication.

3) Allocate and initialize the connection queue of the worker thread.

## create_worker

As mentioned, this method calls pthread_create to create the actual worker threads. The callback passed through is worker_libevent which essentially starts the event loop using event_base_loop, this time, on worker threads rather than dispatch thread.

# Socket initialization

The methods involved in socket initialization reconcile the initialization of both TCP and UDP while the following discussion covers only the TCP logic branch. And we consider portnumber_file is not set so as to focus on the critical path.

Unlike worker threads that listen to internal (pipe) fds, dispatch thread is responsible for events generated from external socket fds (by network requests). The method that initializes sockets is server_sockets.

If network interface is not indicated by inter, server_sockets is equivalent to

## server_socket

1) Get all the available network interfaces.

2) Iterate all the network interfaces and setup the sockets with the following steps.

3) new_socket encapsulates the operation of socket creation as well as that of setting it to non-block.

4) Tweak the newly created socket fd using setsockopt, in which

SO_REUSEADDR allows binding to a port in TIME_WAIT. This is useful for instantly rebooting a server on a “not fresh” TCP port;

SO_KEEPALIVE sends heartbeats to detect an absent client, and to release the resource for network connection in both kernel and user space; learn more

SO_LINGER enables fast close of a connection on RST; learn more

TCP_NODELAY disables nagle to improve latency. learn more

5) bind and start listening to the fd.

6) conn_new initializes the context for the fd and adds it to libevent with initial state set to conn_listening and callback as event_handler. Here event_handler is another transient method leading to the drive machine on dispatcher thread. Likewise, this method will be discussed soon in inter-thread communication.

7) Add the context to the head of a global list listen_conn.

Next we briefly go through the process that handles a new connection to wrap up the

## event_handler

Firstly, after a TCP connection completes, the fd monitored by dispatcher thread notifies libevent, which invokes the mentioned event_handler. Next, the logic flow enters the code snippet we got in the beginning - the drive machine, with context state initialized as conn_listening in socket initialization.

At this stage, the drive machine 1) accepts the connection and derives another fd that can be read from. It 2) then calls dispatch_conn_new with the new fd and other relevant information including the next state, conn_new_cmd.

## dispatch_conn_new

2) Initializes a CQ_ITEM instance. Here CQ_ITEM is an intermediate object passed to worker threads through connection queue, so worker threads can create new context based on it.

3) Push CQ_ITEM to the connection queue.

4) Write to notify_send_fd with the command 'c'.

As discussed before, 4) generates an event on the other side of the pipe (on the chosen worker thread), which invokes

1) Read the command (i.e., 'c') from the pipe.

2) Read the CQ_ITEM from the connection queue.

3) Call conn_new. In server_socket we know that conn_new establishes the context, this time, for the new connection, and adds the accepted fd to libevent. Here on worker thread, the callback is set to event_handler, which essentially connects the drive machine to the upcoming events on the same connection.

4) Set the thread information to the context.