# setsockopt, SO_KEEPALIVE and Heartbeats

There are two end purposes for sending heartbeats through a persistent connection. For a back-end application, heartbeats are generally used to detect an absent client, so as to drop a connection and release the associated resources; for a client, on the contrary, it is to prevent connection resources stored within intermediate nodes being released (such as a NAT router), SO as to KEEP the connection ALIVE.

This article will examine how to configure the four socket options, SO_KEEPALIVE, TCP_KEEPIDLE, TCP_KEEPINTVL and TCP_KEEPCNT with setsockopt() to send heartbeats; and discuss the practice of keep-alive heartbeats in general.

Experiment setting:
OS: Unbutu 16.04
gcc: 5.4.0

# To keep the connection alive

One cause of silent connection drop is NAT entry timeout. A NAT entry consisting of the 4-tuple (source address, source port, destination address and destination port) is recorded by a network router internally for address translation. Due to limited memory available to the hardware, the router has to remove the entry belonging to an inactive session after a timeout. As a result, the connection is effectively closed even though neither ends have explicitly issued a FIN nor RST.

Reconnecting is expensive. An end user has to wait for at least 3xRTT spent by handshakes; and additional logic is required to smoothly restore the UX with the previously interrupted state after the user is back on-line.

In order to avoid the unnecessary handshakes and the RTTs imposed, HTTP adopts KEEP-ALIVE so that the short-lived HTTP sessions can reuse the same established, persistent TCP connection, which is another story.

Next, I will use two programs to illustrate how it works exactly. We look at the code of a server first,

For simplicity, I do not apply IO multiplexing so the server can accept connect from 1 client one time.

the code of client,

After setting the socket options mentioned before, the client initiates the TCP handshakes by connect(), and yield the CPU by sleep().

If you are not familiar with network programming (socket), please read this first.

Next, let’s see the network interaction in action.

Here I removed the irrelevant output of ARPs. If you are not familiar with tcpdump, please read this first.

With the feet gotten wet, now it’s a good time to explain the heartbeat mechanism,

1) SO_KEEPALIVE enables (or disables) heartbeat;

and

2) the side with heartbeat enabled (in this example, client) sends empty packets (👁 length 0); and
3) after received the packets, the other side (server) reply with ACK (👁 Flags [.]); and
4) TCP_KEEPIDLE defines the heartbeat frequency (👁 timestamps).

Note that throughout the process, the read() is blocked in the server side, which means the heartbeat packets are transparent to the recipient (server).

# To detect an absent peer

Besides NAT entry expiration, a connection can be dropped silently in one way or another (e.g., a loosen cable). It is crucial for a server application to identify such exception in time, so it can release the associated resources, invoke clean-up routines and/or notify other peer clients. This is why sending heartbeats from server-side makes more sense.

Since our feet is already wet.
5) TCP_KEEPINTVL defines the heartbeat frequency when there is no answer from the other side; and
6) TCP_KEEPCNT dictates how many unanswered heartbeat will indicate a dropped connection;

Next we modify the server and client code to test this feature

in server, we added all the mentioned socket options,

and client is reduced to

and the tcpdump output (that is executed on server machine, because we are going to unplug the connection from client)

Because we set 5 as the threshold number of unacknowledged packets, and each is 5 seconds apart, (👁 timestamps)

after 5 heartbeats that are unanswered from the client, the

is unblocked with an n indicating a closed connection. So that the process of breaking a connection, unlike heartbeat itself, notifies the monitor (server in this case) which in turn can trigger the mentioned actions to finalize a broken connection.

# Consideration

## When heartbeat should not be used

In mobile network, periodic data transfer will unnecessarily keep the radio active. When this happens in background, the application drains the battery fast and surprises users. So I would go for the extra miles preparing to reconnecting in such case.

## When heartbeat could not be used

For a back-end with heavy traffic, the packets generated by business logic alone can be indicators of connectivity. In such case, I would make the server drop a connection after a client has not been sending packets for a long period of time.

Alternatively, if I need to further reduce false-positive, I could activate the heartbeat mechanism (through setsockopt()) only for a prolonged silence of a client. It is worth noting that when modifying socket option midway, setsockopt() should work on the file descriptor returned by the accept(), i.e., rdf which represents a established connection. (and other settings will be “inherited” from sdf)

## System wide setting

Some of the discussed socket options can also be set using procfs and sysctl.

TCP_KEEPIDLE -> /net/ipv4/tcp_keepalive_time
TCP_KEEPCNT -> /net/ipv4/tcp_keepalive_probes
TCP_KEEPINTVL -> /net/ipv4/tcp_keepalive_intvl

# References

That's it. Did I make a serious mistake? or miss out on anything important? Or you simply like the read. Link me on -- I'd be chuffed to hear your feedback.