linux核外（用户空间）多线程对共享变量的读写使用锁和原子操作（__sync_xxxx函数）对比

最新推荐文章于 2024-09-23 15:10:28 发布

土豆西瓜大芝麻

最新推荐文章于 2024-09-23 15:10:28 发布

阅读量1.1k

点赞数

分类专栏： linux 文章标签： linux

本文链接：https://blog.youkuaiyun.com/jinking01/article/details/120266707

版权

linux 专栏收录该内容

340 篇文章

订阅专栏

由于项目需要，目前在写一个伪文件系统，涉及到核内部分和核外API的封装。核外API的封装涉及到多线程的支持。

因为传统编程时，我们都知道在多个线程共享同一变量时，为了保证结果的唯一性和准确性，我们需要在对共享变量进行操作的前后进行加锁和解锁。这种方式本身是没有问题的。互斥锁的常规操作：

pthread_mutex_t lock;//定义锁
pthread_mutex_init(&lock,...);//初始化锁
//使用锁
pthread_mutex_lock(&lock);//加锁
count++;
pthread_mutex_unlock(&lock);//解锁

但是考虑到性能，同时考虑到内核里面其实是有原子操作的，那么我们使用原子操作代替锁是不是也可以呢？

答案是肯定的，并且效率更高。然而，不知道因为什么原因，最近一些版本的linux只在内核里面使用automic_t并且保留了对应的.h头文件。但是在核外，并没有保留对应的该文件。

原子操作基本原理

所谓原子操作，就是该操作绝不会在执行完毕前被任何其他任务或事件打断，也就说，它的最小的执行单位，不可能有比它更小的执行单位，因此这里的原子实际是使用了物理学里的物质微粒的概念。

　　原子操作需要硬件的支持，因此是架构相关的，其API和原子类型的定义都定义在内核源码树的include/asm/atomic.h文件中，它们都使用汇编语言实现，因为C语言并不能实现这样的操作。

　　原子操作主要用于实现资源计数，很多引用计数(refcnt)就是通过原子操作实现的。原子类型定义如下：

typedef struct 
{ 
    volatile int counter; 
} atomic_t;

　　volatile修饰字段告诉gcc不要对该类型的数据做优化处理，对它的访问都是对内存的访问，而不是对寄存器的访问。

在x86平台上，CPU提供了在指令执行期间对总线加锁的手段。CPU芯片上有一条引线#HLOCK pin，如果汇编语言的程序中在一条指令前面加上前缀"LOCK"，经过汇编以后的机器代码就使CPU在执行这条指令的时候把#HLOCK pin的电位拉低，持续到这条指令结束时放开，从而把总线锁住，这样同一总线上别的CPU就暂时不能通过总线访问内存了，保证了这条指令在多处理器环境中的原子性。

LOCK是一个指令的描述符，表示后续的指令在执行的时候，在内存总线上加锁。总线锁会导致其他几个核在一定时钟周期内无法访问内存。虽然总线锁会影响其他核的性能，但比起操作系统级别的锁，已经轻量太多了。

#lock是锁FSB(前端串行总线，front serial bus)，FSB是处理器和RAM之间的总线，锁住了它，就能阻止其他处理器或core从RAM获取数据。

内核提供atomic_*系列原子操作

声明和定义：

void atomic_set(atomic_t *v, int i);
atomic_t v = ATOMIC_INIT(0);

读写操作：

int atomic_read(atomic_t *v);
void atomic_add(int i, atomic_t *v);
void atomic_sub(int i, atomic_t *v);

加一减一：

void atomic_inc(atomic_t *v);
void atomic_dec(atomic_t *v);

执行操作并且测试结果：执行操作之后，如果v是0，那么返回1，否则返回0

int atomic_inc_and_test(atomic_t *v);
int atomic_dec_and_test(atomic_t *v);
int atomic_sub_and_test(int i, atomic_t *v);
int atomic_add_negative(int i, atomic_t *v);
int atomic_add_return(int i, atomic_t *v);
int atomic_sub_return(int i, atomic_t *v);
int atomic_inc_return(atomic_t *v);
int atomic_dec_return(atomic_t *v);

对于很多核外应用程序，如mysql等数据库程序，它们自己开发了对应的automic.h头文件用于线程并发时的锁。对于普通用户来说，一般也没法用，因为人家定义的相关操作，可能跟你的需求并不一样。所以，如果想用这些原子操作就没辙了。如果说我们把内核的automic.h拿过来直接用行不行？答案是不行，因为内核automic.h用的是内核的内部函数，核外用不了。

好在，gcc（核外的工具）封装了一些内置函数，这些函数就可以支持核外的原子操作。具体的就是__sync_xxxx函数（系列函数）。我们可以直接使用它，也可以再按照核内的automic.h的样子，使用gcc的这些内置函数封装核外的automic.h供核外多线程编程使用。

__sync_xxxx函数

__sync_XXXX系列一共有十二个函数，有加/减/与/或/异或/等函数的原子性操作函数, __sync_fetch_and_add, 顾名思义，现fetch，然后自加，返回的是自加以前的值。以count = 4为例，调用__sync_fetch_and_add(&count,1),之后，返回值是4，然后，count变成了5.

需要注意的是：如果程序编译时出现undefined __sync_xxx的时候，或者编译后无法使用时，则要在用gcc编译的时候要加上选项 -march=i686（并不是所有的环境下编译都需要加这个，这个跟处理器以及架构、gcc版本都有关系，gcc从4.1.2开始提供了__sync_*系列的build-in函数）

type __sync_fetch_and_add (type *ptr, type value);
type __sync_fetch_and_sub (type *ptr, type value);
type __sync_fetch_and_or (type *ptr, type value);
type __sync_fetch_and_and (type *ptr, type value);
type __sync_fetch_and_xor (type *ptr, type value);
type __sync_fetch_and_nand (type *ptr, type value);
type __sync_add_and_fetch (type *ptr, type value);
type __sync_sub_and_fetch (type *ptr, type value);
type __sync_or_and_fetch (type *ptr, type value);
type __sync_and_and_fetch (type *ptr, type value);
type __sync_xor_and_fetch (type *ptr, type value);
type __sync_nand_and_fetch (type *ptr, type value);

需要注意的是，这个type不能乱用(type只能是int, long, long long以及对应的unsigned类型)，同时在用gcc编译的时候要加上选项 -march=i686。实际上，上述函数的参数后面还有一个可扩展参数(...)用来指出哪些变量需要memory barrier，因为目前gcc实现的是full barrier(类似Linux kernel中的mb()，表示这个操作之前的所有内存操作不会被重排到这个操作之后)，所以可以忽略掉这个参数。

__sync_fetch_and_add反汇编出来的指令是
804889d:f0 83 05 50 a0 04 08 lock addl $0x1,0x804a050
可以看到，addl前面有一个lock，这行汇编指令前面是f0开头，f0叫做指令前缀。lock前缀的意思是对内存区域的排他性访问。其实，lock是锁FSB，前端串行总线，Front Serial Bus，这个FSB是处理器和RAM之间的总线，锁住FSB，就能阻止其他处理器或者Core从RAM获取数据。当然这种操作开销相当大，只能操作小的内存可以这样做，想想我们有memcpy，如果操作一大片内存，锁内存，那么代价太大了。所以前面介绍__sync_fetch_and_add等函数，type只能是int, long, long long以及对应的unsigned类型。

此外，还有两个类似的原子操作，

bool __sync_bool_compare_and_swap(type *ptr, type oldval, type newval, ...)
type __sync_val_compare_and_swap(type *ptr, type oldval, type newval, ...)

这两个函数提供原子的比较和交换，如果*ptr == oldval，就将newval写入*ptr，

（1）第一个函数在相等并写入的情况下返回true；

（2）第二个函数在返回操作之前的值。

type __sync_lock_test_and_set(type *ptr, type value, ...)

将*ptr设为value并返回*ptr操作之前的值；

void __sync_lock_release(type *ptr, ...)

将*ptr置为0

有了这些宝贝函数，对于多线程对全局变量进行操作(自加、自减等)问题，我们就不用考虑线程锁，可以考虑使用上述函数代替，和使用pthread_mutex保护的作用是一样的，线程安全且性能上完爆线程锁。

对比核外多线程使用锁和原子操作时的耗时：

#include <stdio.h>
#include <unistd.h>
#include <pthread.h>
#include <stdlib.h>
#include <sys/time.h>
 
int global_int = 0;
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
 
void* thread_func(void *arg)
{
	int i;
	for (i = 0; i < 1000000; i++) {
#ifdef WITH_MUTEX
		pthread_mutex_lock(&mutex);
		global_int++;
		pthread_mutex_unlock(&mutex);
#elif defined WITH_ATOMIC
		__sync_add_and_fetch(&global_int, 1);
#else
		global_int++;
#endif
	}
}
 
int main()
{
	struct timeval start_time, end_time;
	gettimeofday(&start_time, NULL);
	int proc, i;
	proc = sysconf(_SC_NPROCESSORS_ONLN);
	if (proc < 0)
		exit(1);
 
	pthread_t *threadId = (pthread_t *)malloc(proc*sizeof(pthread_t));
	for (i = 0; i < proc; i++) {
		pthread_create(&threadId[i], NULL, thread_func, NULL);
	}
 
	for (i = 0; i < proc; i++) {
		pthread_join(threadId[i], NULL);
	}
 
	gettimeofday(&end_time, NULL);
	printf("thread number = %d global_int = %d cost time msecond = %ld\n", proc, global_int, (long)((end_time.tv_sec - start_time.tv_sec)*1000 + (end_time.tv_usec - start_time.tv_usec)/1000));
}

感受一下上述3种方式的结果：

对比上述结果，可以发现：

（1）使用WITH_ATOMIC和WITH_MUTEX得到的global_int的值都是正确的，而使用时间上，使用原子锁的时间较少；

（2）不加锁和不使用原子操作的情况下，时间非常少，但是结果是错误的。

（3）对比时间发现，加锁和使用原子操作都会带来不少的时间开销，量级在100多ms，还是很大的。

更复杂的测试用例

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <pthread.h>
#include <sched.h>
#include <linux/unistd.h>
#include <sys/syscall.h>
#include <linux/types.h>
#include <time.h>
#include <sys/time.h>
 
#define INC_TO 1000000 // one million
 
__u64 rdtsc ()
{
    __u32 lo, hi;
    __asm__ __volatile__
    (
       "rdtsc":"=a"(lo),"=d"(hi)
    );
 
    return (__u64)hi << 32 | lo;
}
 
int global_int = 0;
 
pthread_mutex_t count_lock = PTHREAD_MUTEX_INITIALIZER;//初始化互斥锁
 
pid_t gettid ()
{
    return syscall(__NR_gettid);
}
 
void * thread_routine1 (void *arg)
{
    int i;
    int proc_num = (int)(long)arg;
    
    __u64 begin, end;
    struct timeval tv_begin, tv_end;
    __u64 time_interval;
    
    cpu_set_t set;
    
    CPU_ZERO(&set);
    CPU_SET(proc_num, &set);
 
    if (sched_setaffinity(gettid(), sizeof(cpu_set_t), &set))
    {
		fprintf(stderr, "failed to set affinity\n");
        return NULL;
    }
    begin = rdtsc();
    gettimeofday(&tv_begin, NULL);
    for (i = 0; i < INC_TO; i++)
    {
        __sync_fetch_and_add(&global_int, 1);
    }
    gettimeofday(&tv_end, NULL);
    end = rdtsc();
    time_interval = (tv_end.tv_sec - tv_begin.tv_sec) * 1000000 + (tv_end.tv_usec - tv_begin.tv_usec);
    fprintf(stderr, "proc_num : %d, __sync_fetch_and_add cost %llu CPU cycle, cost %llu us\n", proc_num, end - begin, time_interval);
    
    return NULL;
}
 
void *thread_routine2(void *arg)
{
    int i;
    int proc_num = (int)(long)arg;
 
    __u64 begin, end;
    struct timeval tv_begin, tv_end;
    __u64 time_interval;
    
    cpu_set_t set;
    
    CPU_ZERO(&set);
    CPU_SET(proc_num, &set);
 
    if (sched_setaffinity(gettid(), sizeof(cpu_set_t), &set))
    {
        fprintf(stderr, "failed to set affinity\n");
        return NULL;
    }
    begin = rdtsc();
    gettimeofday(&tv_begin, NULL);
    for (i = 0; i < INC_TO; i++)
    {
        pthread_mutex_lock(&count_lock);
        global_int++;
        pthread_mutex_unlock(&count_lock);
    }
    gettimeofday(&tv_end, NULL);
    end = rdtsc();
    time_interval = (tv_end.tv_sec - tv_begin.tv_sec) * 1000000 + (tv_end.tv_usec - tv_begin.tv_usec);
    fprintf(stderr, "proc_num : %d, pthread_mutex_lock cost %llu CPU cycle, cost %llu us\n", proc_num, end - begin, time_interval);
    
    return NULL;  
}
 
void *thread_routine3(void *arg)
{
    int i;
    int proc_num = (int)(long)arg;
 
    __u64 begin, end;
    struct timeval tv_begin, tv_end;
    __u64 time_interval;
    
    cpu_set_t set;
    
    CPU_ZERO(&set);
    CPU_SET(proc_num, &set);
 
    if (sched_setaffinity(gettid(), sizeof(cpu_set_t), &set))
    {
        fprintf(stderr, "failed to set affinity\n");
        return NULL;
    }
    begin = rdtsc();
    gettimeofday(&tv_begin, NULL);
    for (i = 0; i < INC_TO; i++)
    {
        global_int++;
    }
    gettimeofday(&tv_end, NULL);
    end = rdtsc();
    time_interval = (tv_end.tv_sec - tv_begin.tv_sec) * 1000000 + (tv_end.tv_usec - tv_begin.tv_usec);
    fprintf(stderr, "proc_num : %d, no lock cost %llu CPU cycle, cost %llu us\n", proc_num, end - begin, time_interval);
    
    return NULL;
}
 
int main()
{
    int procs = 0;
	int all_cores = 0;
    int i;
    pthread_t *thrs;
 
    procs = (int)sysconf(_SC_NPROCESSORS_ONLN);
    if (procs < 0)
    {
	    fprintf(stderr, "failed to fetch available CPUs(Cores)\n");
        return -1;
    }
	all_cores = (int)sysconf(_SC_NPROCESSORS_CONF);
	if (all_cores < 0)
	{
		fprintf(stderr, "failed to fetch system configure CPUs(Cores)\n");
		return -1;
	}
	
	printf("system configure CPUs(Cores): %d\n", all_cores);
	printf("system available CPUs(Cores): %d\n", procs);
 
    thrs = (pthread_t *)malloc(sizeof(pthread_t) * procs);
    if (thrs == NULL)
    {
        fprintf(stderr, "failed to malloc pthread array\n");
        return -1;
    }
	
    printf("starting %d threads...\n", procs);
    
    for (i = 0; i < procs; i++)
    {
        if (pthread_create(&thrs[i], NULL, thread_routine1, (void *)(long) i))
        {
			fprintf(stderr, "failed to pthread create\n");
            procs = i;
            break;
        }
    }
 
    for (i = 0; i < procs; i++)
    {
        pthread_join(thrs[i], NULL);
    }
  
    printf("after doing all the math, global_int value is: %d\n", global_int);
    printf("expected value is: %d\n", INC_TO * procs);
 
	free (thrs);
    
    return 0;
}

先看使用原子操作的结果：

修改main()函数中的线程函数，是使用互斥锁：

修改 main()函数中的线程函数，不加锁也不使用原子操作：

无论哪个测试用例，上述结论都不变。原子操作都是完爆互斥锁。

封装atomic.h

#ifndef _ATOMIC_H
#define _ATOMIC_H
 
 
/**
 * Atomic type.
 */ 
typedef struct {
 
    volatile int counter;
 
} atomic_t;
 
  
#define ATOMIC_INIT(i)  { (i) }
 
 
/**
 * Read atomic variable
 * @param v pointer of type atomic_t
 *
 * Atomically reads the value of @v.
 */
#define atomic_read(v) ((v)->counter)
 
 
/**
 * Set atomic variable
 * @param v pointer of type atomic_t
 * @param i required value
 */
 
#define atomic_set(v,i) (((v)->counter) = (i))
 
 
/**
 * Add to the atomic variable
 * @param i integer value to add
 * @param v pointer of type atomic_t
 */
 
static inline void atomic_add( int i, atomic_t *v )
{
 
         (void)__sync_add_and_fetch(&v->counter, i);
 
}
  
 
/**
 * Subtract the atomic variable
 * @param i integer value to subtract
 * @param v pointer of type atomic_t
 *
 * Atomically subtracts @i from @v.
 */
 
static inline void atomic_sub( int i, atomic_t *v ) 
{
 
        (void)__sync_sub_and_fetch(&v->counter, i);
 
}
  
 
/**
 * Subtract value from variable and test result
 * @param i integer value to subtract
 * @param v pointer of type atomic_t
 *
 * Atomically subtracts @i from @v and returns
 * true if the result is zero, or false for all
 * other cases.
 */
 
static inline int atomic_sub_and_test( int i, atomic_t *v ) 
{
 
        return !(__sync_sub_and_fetch(&v->counter, i));

}
 
 
/**
 * Increment atomic variable
 * @param v pointer of type atomic_t
 *
 * Atomically increments @v by 1.
 */ 
static inline void atomic_inc( atomic_t *v ) 
{
 
       (void)__sync_fetch_and_add(&v->counter, 1);
 
}
 
 
/**
 * @brief decrement atomic variable
 * @param v: pointer of type atomic_t
 *
 * Atomically decrements @v by 1.  Note that the guaranteed
 * useful range of an atomic_t is only 24 bits.
 */
static inline void atomic_dec( atomic_t *v )
 
{
 
       (void)__sync_fetch_and_sub(&v->counter, 1);
 
}
 
 
 
/**
 * @brief Decrement and test
 * @param v pointer of type atomic_t
 *
 * Atomically decrements @v by 1 and
 * returns true if the result is 0, or false for all other
 * cases.
 */
static inline int atomic_dec_and_test( atomic_t *v ) 
{
 
       return !(__sync_sub_and_fetch(&v->counter, 1));
 
}
 
 
/**
 * @brief Increment and test
 * @param v pointer of type atomic_t
 *
 * Atomically increments @v by 1
 * and returns true if the result is zero, or false for all
 * other cases.
 */
static inline int atomic_inc_and_test( atomic_t *v ) 
{
 
      return !(__sync_add_and_fetch(&v->counter, 1));
 
}
 
 
/**
 * @brief add and test if negative
 * @param v pointer of type atomic_t
 * @param i integer value to add
 *
 * Atomically adds @i to @v and returns true
 * if the result is negative, or false when
 * result is greater than or equal to zero.
 */
 
static inline int atomic_add_negative( int i, atomic_t *v )
{
 
       return (__sync_add_and_fetch(&v->counter, i) < 0);
 
}
 
#endif

上述返回类型都是void的，操作完，用户还得去读取相关变量才能使用。如果想直接拿到结果（将返回类型改成int），则可以：

static inline int atomic_add( int i, atomic_t *v )
{
 
         return __sync_add_and_fetch(&v->counter, i);
 
}

参考链接：

https://blog.youkuaiyun.com/shemangui/article/details/50444583

linux用户空间下的原子操作_bingqingsuimeng的专栏-优快云博客