Linux同步互斥6

最新推荐文章于 2025-10-03 11:07:46 发布

原创最新推荐文章于 2025-10-03 11:07:46 发布 · 938 阅读

16 ·

CC 4.0 BY-SA版权

文章标签：

#linux #服务器 #运维

Linux内核子系统专栏收录该内容

199 篇文章

订阅专栏

Linux同步互斥6（基于Linux6.6）---Seqlock

一、概述

Seqlock（Sequence Lock） 是 Linux 内核中用于解决读写并发问题的一种同步机制，特别适用于“读多写少”的场景。它提供了一种高效的方式来协调并发读写操作，减少了传统的锁机制（如自旋锁、互斥锁等）在高并发场景下可能带来的性能损失。

Seqlock 最常用于多读者（Reader）和少数写者（Writer）场景，它允许多个读者并发访问数据，只有在写者修改数据时才会阻塞读者。与其他锁机制相比，Seqlock 的一个显著特点是它能够避免读者之间的冲突，且对于读者来说在无写操作时几乎不需要进行锁操作，从而提高了并发性。

二、工作原理

2.1、overview

Seqlock 由两个部分组成：

序列号（sequence number）：这是一个用于标识当前数据版本的整数，通常是一个无符号整数。每次数据被修改时，序列号会增加。
数据本身：需要保护的共享数据。

总结一下seqlock的特点：临界区只允许一个writer thread进入，在没有writer thread的情况下，reader thread可以随意进入，也就是说reader不会阻挡reader。在临界区只有有reader thread的情况下，writer thread可以立刻执行，不会等待。

2.2、writer thread的操作

对于writer thread，获取seqlock操作如下：

（1）获取锁（例如spin lock），该锁确保临界区只有一个writer进入。

（2）sequence counter+1。

释放seqlock操作如下：

（1）释放锁，允许其他writer thread进入临界区。

（2）sequence counter+1。。

由上面的操作可知，如果临界区没有任何的writer thread，那么sequence counter是偶数（sequence counter初始化为0），如果临界区有一个writer thread（当然，也只能有一个），那么sequence counter是奇数。

2.3、reader thread的操作

对于readerthread，获取seqlock操作如下：

（1）获取sequence counter的值，如果是偶数，可以进入临界区，如果是奇数，那么等待writer离开临界区（sequence counter变成偶数）。进入临界区时候的sequence counter的值我们称之old sequence counter。

（2）进入临界区，读取数据。

（3）获取sequence counter的值，如果等于old sequence counter，说明一切OK，否则回到step（1）。

2.4、适用场景。

一般而言，seqlock适用于：

（1）read操作比较频繁。

（2）write操作较少，但是性能要求高，不希望被reader thread阻挡（之所以要求write操作较少主要是考虑read side的性能）。

（3）数据类型比较简单，但是数据的访问又无法利用原子操作来保护。

举例：假设需要保护的数据是一个链表，header--->A node--->B node--->C node--->null。reader thread遍历链表的过程中，将B node的指针赋给了临时变量x，这时候，中断发生了，reader thread被preempt（注意，对于seqlock，reader并没有禁止抢占）。这样在其他cpu上执行的writer thread有充足的时间释放B node的memory（注意：reader thread中的临时变量x还指向这段内存）。当read thread恢复执行，并通过x这个指针进行内存访问（例如试图通过next找到C node），悲剧发生了……

三、API示例

在kernel中，jiffies_64保存了从系统启动以来的tick数目，对该数据的访问（以及其他jiffies相关数据）需要持有jiffies_lock这个seq lock。

3.1、reader side代码如下：

kernel/time/jiffies.c

u64 get_jiffies_64(void)
{
	unsigned int seq;
	u64 ret;

	do {
		seq = read_seqcount_begin(&jiffies_seq);
		ret = jiffies_64;
	} while (read_seqcount_retry(&jiffies_seq, seq));
	return ret;
}
EXPORT_SYMBOL(get_jiffies_64);

3.2、writer side代码如下：

kernel/time/tick-sched.c

static void tick_do_update_jiffies64(ktime_t now)
{
	unsigned long ticks = 1;
	ktime_t delta, nextp;

	/*
	 * 64bit can do a quick check without holding jiffies lock and
	 * without looking at the sequence count. The smp_load_acquire()
	 * pairs with the update done later in this function.
	 *
	 * 32bit cannot do that because the store of tick_next_period
	 * consists of two 32bit stores and the first store could move it
	 * to a random point in the future.
	 */
	if (IS_ENABLED(CONFIG_64BIT)) {
		if (ktime_before(now, smp_load_acquire(&tick_next_period)))
			return;
	} else {
		unsigned int seq;

		/*
		 * Avoid contention on jiffies_lock and protect the quick
		 * check with the sequence count.
		 */
		do {
			seq = read_seqcount_begin(&jiffies_seq);
			nextp = tick_next_period;
		} while (read_seqcount_retry(&jiffies_seq, seq));

		if (ktime_before(now, nextp))
			return;
	}

	/* Quick check failed, i.e. update is required. */
	raw_spin_lock(&jiffies_lock);
	/*
	 * Reevaluate with the lock held. Another CPU might have done the
	 * update already.
	 */
	if (ktime_before(now, tick_next_period)) {
		raw_spin_unlock(&jiffies_lock);
		return;
	}

	write_seqcount_begin(&jiffies_seq);

	delta = ktime_sub(now, tick_next_period);
	if (unlikely(delta >= TICK_NSEC)) {
		/* Slow path for long idle sleep times */
		s64 incr = TICK_NSEC;

		ticks += ktime_divns(delta, incr);

		last_jiffies_update = ktime_add_ns(last_jiffies_update,
						   incr * ticks);
	} else {
		last_jiffies_update = ktime_add_ns(last_jiffies_update,
						   TICK_NSEC);
	}

	/* Advance jiffies to complete the jiffies_seq protected job */
	jiffies_64 += ticks;

	/*
	 * Keep the tick_next_period variable up to date.
	 */
	nextp = ktime_add_ns(last_jiffies_update, TICK_NSEC);

	if (IS_ENABLED(CONFIG_64BIT)) {
		/*
		 * Pairs with smp_load_acquire() in the lockless quick
		 * check above and ensures that the update to jiffies_64 is
		 * not reordered vs. the store to tick_next_period, neither
		 * by the compiler nor by the CPU.
		 */
		smp_store_release(&tick_next_period, nextp);
	} else {
		/*
		 * A plain store is good enough on 32bit as the quick check
		 * above is protected by the sequence count.
		 */
		tick_next_period = nextp;
	}

	/*
	 * Release the sequence count. calc_global_load() below is not
	 * protected by it, but jiffies_lock needs to be held to prevent
	 * concurrent invocations.
	 */
	write_seqcount_end(&jiffies_seq);

	calc_global_load();

	raw_spin_unlock(&jiffies_lock);
	update_wall_time();
}

对照上面的代码，使用seqlock来保护自己的临界区。

四、代码实现

4.1、seq lock的定义

include/linux/seqlock.h

typedef struct {
	/*
	 * Make sure that readers don't starve writers on PREEMPT_RT: use
	 * seqcount_spinlock_t instead of seqcount_t. Check __SEQ_LOCK().
	 */
	seqcount_spinlock_t seqcount;
	spinlock_t lock;
} seqlock_t;

seq lock实际上就是spin lock ＋ sequence counter。

4.2、write_seqlock/write_sequnlock

include/linux/seqlock.h

static inline void write_seqlock(seqlock_t *sl)
{
	spin_lock(&sl->lock);
	do_write_seqcount_begin(&sl->seqcount.seqcount);
}

唯一需要说明的是smp_wmb这个用于SMP场合下的写内存屏障，它确保了编译器以及CPU都不会打乱sequence counter内存访问以及临界区内存访问的顺序（临界区的保护是依赖sequence counter的值，因此不能打乱其顺序）。

4.3、read_seqbegin

include/linux/seqlock.h

static inline unsigned read_seqbegin(const seqlock_t *sl)
{
	unsigned ret = read_seqcount_begin(&sl->seqcount);

	kcsan_atomic_next(0);  /* non-raw usage, assume closing read_seqretry() */
	kcsan_flat_atomic_begin();
	return ret;
}

如果有writer thread，read_seqbegin函数中会有一个不断polling sequenc counter，直到其变成偶数的过程，在这个过程中，如果不加以控制，那么整体系统的性能会有损失（这里的性能指的是功耗和速度）。因此，在polling过程中，有一个cpu_relax的调用，对于ARM64，其代码是：

arch/arm64/include/asm/vdso/processor.h

static inline void cpu_relax(void)
{
	asm volatile("yield" ::: "memory");
}

yield指令用来告知硬件系统，本cpu上执行的指令是polling操作，没有那么急迫，如果有任何的资源冲突，本cpu可以让出控制权。

4.4、read_seqretry

include/linux/seqlock.h

static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
{
	/*
	 * Assume not nested: read_seqretry() may be called multiple times when
	 * completing read critical section.
	 */
	kcsan_flat_atomic_end();

	return read_seqcount_retry(&sl->seqcount, start);
}

start参数就是进入临界区时候的sequenc counter的快照，比对当前退出临界区的sequenc counter，如果相等，说明没有writer进入打搅reader thread，那么可以愉快的离开临界区。

五、举例应用

Seqlock 是一种高效的同步机制，广泛应用于 Linux 内核中，尤其适用于读多写少的场景。下面将通过几个具体应用场景，展示如何在 Linux 中使用 Seqlock。

5.1、内核中的网络统计数据

假设我们有一个网络统计模块，它用于记录网络接口的数据传输量。该统计数据通常会有多个读者（用于获取当前数据）和一个写者（用于更新统计数据）。在这种情况下，Seqlock 是一个理想的同步机制。

示例代码：

#include <linux/seqlock.h>
#include <linux/kernel.h>
#include <linux/module.h>

static seqlock_t net_stats_lock = SEQLOCK_INIT;  // 定义一个seqlock变量
static int net_stats_data = 0;  // 模拟的网络统计数据

// 读操作
void read_net_stats(void)
{
    unsigned seq;
    int data;

    do {
        seq = read_seqbegin(&net_stats_lock);  // 获取序列号
        data = net_stats_data;  // 读取共享数据
    } while (read_seqretry(&net_stats_lock, seq));  // 检查序列号是否变化，如果变化则重试

    printk(KERN_INFO "Read network stats: %d\n", data);
}

// 写操作
void update_net_stats(int new_data)
{
    write_seqlock(&net_stats_lock);  // 获取写锁
    net_stats_data = new_data;  // 更新网络统计数据
    write_sequnlock(&net_stats_lock);  // 释放写锁

    printk(KERN_INFO "Updated network stats: %d\n", new_data);
}

解释：

read_net_stats()：多个读者可以并发读取 net_stats_data，但需要确保在读取期间没有写者修改数据。如果在读取期间序列号发生变化，则说明有写者修改了数据，读者需要重新尝试读取。
update_net_stats()：写者通过 write_seqlock() 和 write_sequnlock() 来保护更新操作，确保在更新期间不会有读者访问数据。

5.2、内核配置数据

在内核模块中，常常会涉及到配置数据的读写。假设有一个配置数据结构，它由多个读者访问（查询当前配置），而写操作则很少发生。Seqlock 是此类场景的理想选择。

示例代码：

#include <linux/seqlock.h>
#include <linux/kernel.h>
#include <linux/module.h>

static seqlock_t config_lock = SEQLOCK_INIT;  // 定义一个seqlock变量
static int config_data = 42;  // 模拟的配置数据

// 读操作
void read_config(void)
{
    unsigned seq;
    int data;

    do {
        seq = read_seqbegin(&config_lock);  // 获取序列号
        data = config_data;  // 读取配置数据
    } while (read_seqretry(&config_lock, seq));  // 检查序列号是否变化，如果变化则重试

    printk(KERN_INFO "Read config data: %d\n", data);
}

// 写操作
void update_config(int new_config)
{
    write_seqlock(&config_lock);  // 获取写锁
    config_data = new_config;  // 更新配置数据
    write_sequnlock(&config_lock);  // 释放写锁

    printk(KERN_INFO "Updated config data: %d\n", new_config);
}

解释：

read_config()：读者在访问配置数据时，首先会检查序列号，如果在读取期间序列号发生变化，则会重新读取，直到序列号稳定。
update_config()：写者修改配置时，通过 write_seqlock() 来锁住数据，并更新配置项。更新完成后，释放锁。

5.3、内存页状态的读写

在 Linux 内核中，某些系统组件需要读取和修改内存页的状态。例如，操作系统可能会有一个页状态管理模块，多个线程需要频繁读取页的状态，而只有在内存状态发生变化时才会写入。Seqlock 可以有效地解决这个问题。

示例代码：

#include <linux/seqlock.h>
#include <linux/kernel.h>
#include <linux/module.h>

static seqlock_t page_state_lock = SEQLOCK_INIT;  // 定义一个seqlock变量
static int page_state = 0;  // 模拟的内存页状态

// 读操作
void read_page_state(void)
{
    unsigned seq;
    int state;

    do {
        seq = read_seqbegin(&page_state_lock);  // 获取序列号
        state = page_state;  // 读取内存页状态
    } while (read_seqretry(&page_state_lock, seq));  // 检查序列号是否变化，如果变化则重试

    printk(KERN_INFO "Read page state: %d\n", state);
}

// 写操作
void update_page_state(int new_state)
{
    write_seqlock(&page_state_lock);  // 获取写锁
    page_state = new_state;  // 更新内存页状态
    write_sequnlock(&page_state_lock);  // 释放写锁

    printk(KERN_INFO "Updated page state: %d\n", new_state);
}

解释：

read_page_state()：多个线程可以并发读取 page_state，但是必须确保在读取期间没有写操作发生。如果序列号发生变化，则需要重新尝试读取。
update_page_state()：写者对 page_state 进行修改，确保在修改过程中没有并发读操作，同时更新后发布新的序列号。

5.4、性能统计数据（如计数器）

对于某些性能统计数据（例如计数器），通常会有很多读取操作和很少的写入操作。在这种情况下，Seqlock 可以有效减少锁竞争，提高系统性能。

示例代码：

#include <linux/seqlock.h>
#include <linux/kernel.h>
#include <linux/module.h>

static seqlock_t counter_lock = SEQLOCK_INIT;  // 定义一个seqlock变量
static int counter = 0;  // 模拟的性能计数器

// 读操作
void read_counter(void)
{
    unsigned seq;
    int value;

    do {
        seq = read_seqbegin(&counter_lock);  // 获取序列号
        value = counter;  // 读取计数器
    } while (read_seqretry(&counter_lock, seq));  // 检查序列号是否变化，如果变化则重试

    printk(KERN_INFO "Read counter: %d\n", value);
}

// 写操作
void increment_counter(void)
{
    write_seqlock(&counter_lock);  // 获取写锁
    counter++;  // 增加计数器
    write_sequnlock(&counter_lock);  // 释放写锁

    printk(KERN_INFO "Incremented counter: %d\n", counter);
}

解释：