理解 Memcached 源码 - LRU I

May 18, 2019 Memcached 源码分析

多半情况下，LRU 会和哈希表一起使用，然后我们把这个组合称为

LRU 缓存

在 LRU缓存 中，哈希表提供了快速随机访问对象的能力；而LRU（算法）则用于淘汰很久没用 (least recently used) 的对象，来避免缓存无限增加。我们先大致看下 LRU 组成。

链表

从技术上来说，LRU 算法是在链表上完成的 - 当一个表项被使用（访问或更新），LRU 会先做移除，然后把它重新插入到表头。这样，越接近表尾的对象就是 越久没使用 (least recently used)，淘汰起来比较简单。

哈希表

链表是不支持快速的随机访问的，所以需要和 哈希表 一起使用。之前我们之前已经读过，板子系统的空闲列表是通过链表把板里的块 (chuck) 串起来形成的。在 LRU缓存 里也差不多，而这次用链表串起来的是表项。大致看起来是这样：

从另外一个维度看起来可能会更直观一点：

核心数据结构 - item

typedef struct _stritem {
    /* Protected by LRU locks */
    struct _stritem *next;
    struct _stritem *prev;
    /* Rest are protected by an item lock */
    struct _stritem *h_next;    /* hash chain next */
    rel_time_t      time;       /* least recent access */
    rel_time_t      exptime;    /* expire time */
    int             nbytes;     /* size of data */
    unsigned short  refcount;
    uint8_t         nsuffix;    /* length of flags-and-length string */
    uint8_t         it_flags;   /* ITEM_* above */
    uint8_t         slabs_clsid;/* which slab class we're in */
    uint8_t         nkey;       /* key length, w/terminating null and padding */
    /* this odd type prevents type-punning issues when we do
     * the little shuffle to save space when not using CAS. */
    union {
... // scr: cas
        char end; // scr: flexible array member indicating the item header "end"
    } data[];
    /* if it_flags & ITEM_CAS we have 8 bytes CAS */
    /* then null-terminated key */
    /* then " flags length\r\n" (no terminating null) */
    /* then data with terminating \r\n (no terminating null; it's binary!) */
} item;

do_item_unlink@item.c

使用到的字段

next, prev - LRU 链表 指针, 在 do_item_alloc (LRU III) 初始化, 被 item_link_q, item_unlink_q 使用。

h_next - hash 冲突链表 的指针, 在 do_item_alloc (LRU III) 初始化, 被 assoc_insert, assoc_delete, 多个模块 (LRU II) 使用。

time - 最后访问时间, 在 do_item_link 中设置, 被 lru_pull_tail (LRU III) 使用。

exptime - 超时时间（由请求参数指定）, 在 do_item_alloc (LRU III) 初始化, 被 lru_pull_tail (LRU III) 使用。

nbytes - 数据大小（由请求参数指定），在 do_item_alloc (LRU III) 初始化。

refcount - 引用计数, 在 do_slabs_alloc (Slab III) 初始化, 被 do_item_link 使用。

nsuffix - 在 do_item_alloc (LRU III) 用 item_make_header 初始化。

it_flags - 在 do_item_alloc (LRU III) 初始化, 被 do_item_link, do_item_unlink 使用。

slabs_clsid - 当前对象存在的具体的 LRU 链表 , 被 do_item_alloc (LRU III) 初始化, 在 item_link_q, item_unlink_q 使用。

nkey - 键大小, 在 do_item_alloc (LRU III) 中计算, 被 assoc_delete 使用。

块的内存布局

我们在 do_slabs_free 提到过块。这次我们直接通过数据结构来看下它的构造。

下面我们来读直接操作 LRU 的相关代码。

do_item_link

int do_item_link(item *it, const uint32_t hv) { // scr: -------------------> 1)
...
    it->it_flags |= ITEM_LINKED;                // scr: -------------------> 2)
    it->time = current_time;

... // scr: stat

    /* Allocate a new CAS ID on link. */
... // scr: cas
    assoc_insert(it, hv);                       // scr: -------------------> 3)
    item_link_q(it);                            // scr: -------------------> 4)
    refcount_incr(&it->refcount);               // scr: -------------------> 5)
... // scr: stat

    return 1;
}

do_item_link@item.c

1) 正常理解， hv 应该就是哈希值 “hashed value” 的缩写。

2) 设置 it->it_flags 的 ITEM_LINKED 标志, 然后将当前时间赋值给 it->time。

The field it_flags is used in do_slabs_free and do_slabs_alloc

3) 将对象添加到哈系表。

4) 将对象添加到链表。

5) 递增 reference count。

这个字段的初始值是 1，do_slabs_alloc

要注意 引用计数 代表了有几个子模块同时在使用该资源。这个字段是决定是否回收该资源的关键参考（在这里，对象同时被板和 LRU 在使用）。我在另一篇文章里详细讨论了C++里相似的机制。

item_link_q - 添加至链表

item_link_q 是主力函数 do_item_link_q 的线程安全的简单包装。

static void item_link_q(item *it) {
    pthread_mutex_lock(&lru_locks[it->slabs_clsid]);
    do_item_link_q(it);
    pthread_mutex_unlock(&lru_locks[it->slabs_clsid]);
}

item_link_q@item.c

static void do_item_link_q(item *it) { /* item is the new head */
    item **head, **tail;
    assert((it->it_flags & ITEM_SLABBED) == 0);

    head = &heads[it->slabs_clsid];           // scr: -------------------> 1)
    tail = &tails[it->slabs_clsid];
    assert(it != *head);
    assert((*head && *tail) || (*head == 0 && *tail == 0));
    it->prev = 0;                             // scr: -------------------> 2)
    it->next = *head;
    if (it->next) it->next->prev = it;
    *head = it;
    if (*tail == 0) *tail = it;
    sizes[it->slabs_clsid]++;                 // scr: -------------------> 3)
    return;
}

do_item_link_q@item.c

1) 从对应的 LRU 链表 （由slabs_clsid指定）获取 head 和 tail。注意 slabs_clsid 是用队列类型加过盐的，所以每个板组可能会包含复数个列表。

2) 标准动作，”添加表头项”。

3) 增加全局的数组大小。

static item *heads[LARGEST_ID];
static item *tails[LARGEST_ID];
...
static unsigned int sizes[LARGEST_ID];

item.c:59

assoc_insert - 添加至哈希表

int assoc_insert(item *it, const uint32_t hv) { // scr: again, hv -> hash value
    unsigned int oldbucket;

... // scr: expanding related operations
    } else {
        it->h_next = primary_hashtable[hv & hashmask(hashpower)]; // scr:  1)
        primary_hashtable[hv & hashmask(hashpower)] = it;         // scr:  2)
    }

... // scr: expanding related operations
}

assoc_insert@assoc.c

1) 冲突处理。没冲突？将 h_next 直接设置为 null。

2) 将对象添加到 primary_hashtable 的桶。

...
static item** primary_hashtable = 0;
...

assoc.c:42

扩容的逻辑先留个坑，下篇再来讲。

do_item_unlink

void do_item_unlink(item *it, const uint32_t hv) {
    MEMCACHED_ITEM_UNLINK(ITEM_key(it), it->nkey, it->nbytes);
    if ((it->it_flags & ITEM_LINKED) != 0) {
        it->it_flags &= ~ITEM_LINKED;         // scr: -------------------> 1)
... // scr: stat
        assoc_delete(ITEM_key(it), it->nkey, hv); // scr: ---------------> 2)
        item_unlink_q(it);                    // scr: -------------------> 3)
        do_item_remove(it);                   // scr: -------------------> *)
    }
}

do_item_unlink@item.c

1) 清除 it->it_flags 的 ITEM_LINKED 位。

2) 从哈希表移除对象。

3) 从链表移除对象。

*) 实际释放对象的逻辑后面会讲。

item_unlink_q - 从链表移除

同上，item_unlink_q 只是一个对于实调函数 do_item_unlink_q 线程安全的简单封装。

static void item_link_q(item *it) {
    pthread_mutex_lock(&lru_locks[it->slabs_clsid]);
    do_item_link_q(it);
    pthread_mutex_unlock(&lru_locks[it->slabs_clsid]);
}

item_unlink_q@item.c

static void do_item_unlink_q(item *it) {
    item **head, **tail;
    head = &heads[it->slabs_clsid];           // scr: -------------------> 1)
    tail = &tails[it->slabs_clsid];

    if (*head == it) {                        // scr: -------------------> 2)
        assert(it->prev == 0);
        *head = it->next;
    }
    if (*tail == it) {
        assert(it->next == 0);
        *tail = it->prev;
    }
    assert(it->next != it);
    assert(it->prev != it);

    if (it->next) it->next->prev = it->prev;
    if (it->prev) it->prev->next = it->next;
    sizes[it->slabs_clsid]--;                 // scr: -------------------> 3)
    return;
}

do_item_unlink_q@item.c

1) 同样, 获取由 slabs_clsid 指定的 LUR 链表 的 head 和 tail。

2) 标准的 “移除链表项” 操作。

3) 减少全局的数组大小。

static item *heads[LARGEST_ID];
static item *tails[LARGEST_ID];
...
static unsigned int sizes[LARGEST_ID];

item.c:59

assoc_delete - 从哈希表移除

static item** _hashitem_before (const char *key, const size_t nkey, const uint32_t hv) {
    item **pos;
    unsigned int oldbucket;

... // scr: expanding related operations
    } else {
        pos = &primary_hashtable[hv & hashmask(hashpower)]; // scr: -----> 1)
    }

    while (*pos && ((nkey != (*pos)->nkey) || memcmp(key, ITEM_key(*pos), nkey))) {
        pos = &(*pos)->h_next; // scr: ----------------------------------> 2)
    }
    return pos;
}

...

void assoc_delete(const char *key, const size_t nkey, const uint32_t hv) {
    item **before = _hashitem_before(key, nkey, hv);

    if (*before) {
        item *nxt;
...
        nxt = (*before)->h_next; // scr: --------------------------------> 3)
        (*before)->h_next = 0;   /* probably pointless, but whatever. */
        *before = nxt; // scr: ------------------------------------------> 4)
        return;
    }
    /* Note:  we never actually get here.  the callers don't delete things
       they can't find. */
    assert(*before != 0);
}

assoc_delete@assoc.c

1) 通过 hv 获取桶。

2) 遍历冲突链表并比较 key。注意这里的返回值是 指定元素上一个元素的 next 字段的地址。而当没有冲突时，这个地址就是桶本身。

3) 将找到的下一个元素赋值给 nxt。

4) 更新 指定元素上一个元素的 next 字段。

打包带走

试下这个

That's it. Did I make a serious mistake? or miss out on anything important? Or you simply like the read. Link me on -- I'd be chuffed to hear your feedback.