深入理解计算机系统 ---线程

华东人王

已于 2023-04-18 14:33:41 修改

阅读量816

点赞数

分类专栏：深入理解计算机操作系统文章标签：算法 linux

于 2021-11-30 16:01:44 首次发布

本文链接：https://blog.youkuaiyun.com/weixin_42237429/article/details/121634583

版权

深入理解计算机操作系统专栏收录该内容

11 篇文章

订阅专栏

常规做法：因为同步操作P，V开销较大，所以线程越多，开销就越费时。

改进一：避免同步

改进二：使用局部变量而不是全部变量，消除不必要的内存引用

6.线程安全

7.竞争

8.死锁

1.常用函数

创建线程

pthread_t *thread //新创建的线程ID
const pthread_attr_t *attr //线程属性，默认为NULL
void *(*start_routine) (void *) //新创建的线程从start_routine函数的地址开始运行
void *arg //默认为NULL。若上述函数需要参数，

//可将参数放入结构中，并将地址作为arg参数传入

int pthread_create(
                    pthread_t *thread, 
                    const pthread_attr_t *attr,
                    void *(*start_routine) (void *), 
                    void *arg
                );

获取线程ID

pthread_t pthread_self(void)；

终止线程

显式地终止线程
如果主线程调用，他会等所有其他对等线程终止，然后再终止主线程和整个进程

void pthread_exit(void *retval)

以当前线程ID作为参数终止当前进程

int pthread_cancel(pthread_t thread)

回收已终止的线程

线程调用 pthread_join 函数等待其他线程终止，pthread_join 函数会阻塞，直到线程 thread 终止，然后将线程返回的通用 (void)*指针赋值给 retval，最后回收已终止线程占用的所有内存资源。

int pthread_join(pthread_t thread, void **retval)

分离线程

线程是可结合或者分离的，默认可结合，线程可以被其他程序回收和杀死，在被其他线程回收之前，它的内存资源是不释放的。

可分离线程不能被其他线程回收或杀死，其内存资源在它终止时由系统自动释放。

int pthread_detach(pthread_t thread)

如下程序所示，主线程创建一个对等线程，然后等待它的终止；

对等线程输出“hello world !”并终止；

当主线程检测到对等线程终止后，主线程通过调用 exit 终止该进程；

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
void *thread(void *vargp);                   

int main()                                    
{
    //声明一个本地变量，用来存放对等线程ID  
    pthread_t tid;       

    //创建对等线程，当函数返回时，主线程和对等线程同时运行
    //tid包含新创建线程的ID
    pthread_create(&tid, NULL, thread, NULL); 

    //调用 pthread_join ，主线程等待对等线程终止
    pthread_join(tid, NULL);        

    //对等线程终止后，主线程调用 exit 终止运行在这个进程中的所有线程（这个程序中只有主线程）          
    exit(0);                                  
}

void *thread(void *vargp)  
{
    printf("Hello, world!\n");                 
    return NULL;                              
}

2.共享变量

全局变量：在函数外声明的变量
局部变量：在函数内声明，且没有用 static 关键字
局部静态变量：在函数内用 static 关键字声明的变量

一个变量只有在被多个线程引用的时候才算是共享，在下面这个例子中：

共享变量有： ptr、 cnt 、 msgs；

非共享变量有： i 、 myid

#include <stdio.h>
#include <pthread.h>

#define N 2
void *thread(void *vargp);

char **ptr;  /* Global variable */ // 全局变量

int main() 
{
    int i;  
    pthread_t tid;
    char *msgs[N] = 
    {
		"Hello from foo",  
		"Hello from bar"   
    };

    ptr = msgs; 
    for (i = 0; i < N; i++)  
        pthread_create(&tid, NULL, thread, &i); 
    pthread_exit(NULL); 
}

void *thread(void *vargp) 
{
    int myid = *(int*)vargp;    //局部变--不是共享的，每一个只被一个线程引用

    static int cnt = 0; 		//局部静态变量--共享的，被两个对等线程引用
   
    printf("[%d]: %s (cnt=%d)\n", myid, ptr[myid], ++cnt); 
    return NULL;
}

3.用信号量同步线程

同步错误

共享变量可能会引入“同步错误”，如下程序，创建两个线程，每个线程都会对共享变量 cnt 加 1。

每个线程都对 cnt 加 1，结果应该时 2×niters 次，但是由于 cnt++ 实际分三步进行：加载cnt、更新cnt、储存cnt。

当某一步被打断，如其中一个线程加载 cnt 后，被另外一个线程打断，此时两个线程加载的是同一个值，之后两个进程分别对这个值加 1，但是最终储存的都是 cnt+1，虽然运行了两次，结果只加了一次。

所以 cnt+1 包含的这三个操作必须在一次执行中完成，一旦次序打乱，就会出现问题，不同线程拿到的值就不一定是最新的。--这就是信号量需要解决的同步问题

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
/*
main中argc、argv的具体含义 
        argc和argv参数在用命令行编译程序时有用。
        main( int argc, char* argv[], char **env ) 中 
        第一个参数，int型的argc，为整型，用来统计程序运行时发送给main函数的命令行参数的个数，在VS中默认值为1。 
        第二个参数，char*型的argv[]，为字符串数组，用来存放指向的字符串参数的指针数组，每一个元素指向一个参数。各成员含义如下： 
        argv[0]指向程序运行的全路径名 
        argv[1]指向在DOS命令行中执行程序名后的第一个字符串 
        argv[2]指向执行程序名后的第二个字符串 
        argv[3]指向执行程序名后的第三个字符串 
        argv[argc]为NULL 
        第三个参数，char**型的env，为字符串数组。env[]的每一个元素都包含ENVVAR=value形式的字符串，其中ENVVAR为环境变量，value为其对应的值。平时使用到的比较少。
*/
void *thread(void *vargp);  /* Thread routine prototype */

/* Global shared variable */
volatile long cnt = 0; /* Counter */ //全局共享变量

int main(int argc, char **argv) 
{
    long niters;
    pthread_t tid1, tid2;

    /* Check input argument */
    //默认 argc=1
    //当输入一个参数，如./a.out 1000000       argc=2
    //当输入两个参数，如./a.out 1000000  8000 argc=3
    if (argc != 2)   
    {   
	    printf("usage: %s <niters>\n", argv[0]);
	    exit(0);
    }
    niters = atoi(argv[1]);

    /* Create threads and wait for them to finish */
    pthread_create(&tid1, NULL, thread, &niters);
    pthread_create(&tid2, NULL, thread, &niters);
    pthread_join(tid1, NULL);
    pthread_join(tid2, NULL);

    /* Check result */
    if (cnt != (2 * niters))
	    printf("BOOM! cnt=%ld\n", cnt);
    else
	    printf("OK cnt=%ld\n", cnt);
    exit(0);
}

/* Thread routine */
void *thread(void *vargp) 
{
    long i, niters = *((long *)vargp);
	
    for (i = 0; i < niters; i++)    //line:conc:badcnt:beginloop
	cnt++;                          //line:conc:badcnt:endloop

    return NULL;
}

不同次输出如下,结果应该时 2×niters 次，但是每次都不一样。

linux> ./a.out 1000000
BOOM! cnt=1020476

linux> ./a.out 1000000
BOOM! cnt=1021765

linux> ./a.out 1000000
BOOM! cnt=1008774

linux> ./a.out 1000000
BOOM! cnt=1015091

信号量

初始化信号量

sem ：指向信号量对象

pshared : 指明信号量的类型。不为0时此信号量在进程间共享，否则只能为当前进程的所有线程共享。

value : 指定信号量值的大小

成功时返回 0；错误时，返回 -1，并把 errno 设置为合适的值。

int sem_init(sem_t *sem, int pshared, unsigned int value)

从信号量的值减去一个“1”

int sem_wait(sem_t *sem)

从信号量的值加上一个“1”

int sem_post(sem_t *sem)

信号量提供了一种确保对共享变量的互斥访问：
基本思想是将每个共享变量（或一组相关的共享变量），与一个信号量s（初始为1）联系起来，然后用 sem_wait（P）和sem_post（V）操作将相应临界区包围起来。在临界区内，不允许有多个线程执行指令，从而确保了对临界区的互斥访问。这种以提供给互斥为目的的信号量，叫互斥锁，P操作称为加锁，V操作称为解锁。
使用信号量来实现互斥

在上面历程的基础上，为了用信号量正确同步计数器cnt

首先声明一个信号量mutex：

volatile long cnt = 0;

sem_t mutex;

然后在主历程中将mutex初始化为1

sem_init(&mutex, 0, 1);

在线程中用 P 和 V 包围cnt的更新操作，保护cnt++操作：

当第一个线程运行到此处时，sem_wait(&mutex) 将 mutex 减 1 ，此时 mutex = 0；

然后执行下面操作；

此时若第二个线程也来到此处，会发现 mutex = 0 ，sem_wait(&mutex) 会根据 mutex = 0 把第二个线程挂起；

当第一个线程运行完程序，执行 sem_post(&mutex) 将 mutex 加 1，此时 mutex = 1；

第一个线程运行完毕；

此时，第二个线程的 sem_wait(&mutex) 将会发现 mutex = 1 ，然后重启第二个线程；

如此循环，实现同一个线程执行过程中，其他线程无法干扰的机制。

 for (i = 0; i < niters; i++)    //line:conc:badcnt:beginloop
    {
        sem_wait(&mutex);
	    cnt++;                          //line:conc:badcnt:endloop
        sem_post(&mutex);
    }

不同次输出如下,结果确实是 2×niters 次：

linux> ./a.out 1000000
OK cnt=2000000

linux> ./a.out 50000000
OK cnt=100000000

4. 用条件变量同步线程

5 利用线程提高并行性

大多数计算机都有多核处理器，操作系统内核在多个核上会并行地调度这些并发程序，而不是在单个核上顺序地调度。

以0、1~n-1求和为例：将任务分配到不同线程最直接的方法，是将序列分成 t 个不相交的区域，用 t 个线程分别计算。

常规做法：因为同步操作P，V开销较大，所以线程越多，开销就越费时。

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <semaphore.h>

#define MAXTHREADS 32    

void *sum_mutex(void *vargp); /* Thread routine */

/* Global shared variables */
long gsum = 0;           /* Global sum */ //全局共享变量
long nelems_per_thread;  /* Number of elements to sum */
sem_t mutex;             /* Mutex to protect global sum */

int main(int argc, char **argv) 
{
    long i, nelems, log_nelems, nthreads, myid[MAXTHREADS];
    pthread_t tid[MAXTHREADS];

    /* Get input arguments */
    if (argc != 3) 
    {
        printf("Usage: %s <nthreads> <log_nelems>\n", argv[0]);
        exit(0);
    }
    nthreads = atoi(argv[1]);       //区域个数
    log_nelems = atoi(argv[2]);
    nelems = (1L << log_nelems);    //n的值

    /* Check input arguments */
    if  ((nelems % nthreads) != 0 || (log_nelems > 31)) 
    {
        printf("Error: invalid nelems\n");
        exit(0);
    }

    nelems_per_thread = nelems / nthreads;  //每个区域元素数目 =  n的值 / 区域个数 

    sem_init(&mutex, 0, 1);

    /* Create peer threads and wait for them to finish */

    //主线程传递给对等线程一个小整数，作为唯一线程ID
    //每个对等线程会用它的线程ID，来决定应该计算序列哪一部分
    for (i = 0; i < nthreads; i++) 
    {                 
        myid[i] = i;                                 
        pthread_create(&tid[i], NULL, sum_mutex, &myid[i]); 
    }                                               
    for (i = 0; i < nthreads; i++)                   
	    pthread_join(tid[i], NULL);                  
    
    /* Check final answer */
    if (gsum != (nelems * (nelems-1))/2)    //求和公式n(n-1)/2    
	    printf("Error: result=%ld\n", gsum); 
    else
        printf("OK: result=%ld\n", gsum); 

    exit(0);
}

void *sum_mutex(void *vargp) 
{
    //提取线程ID
    long myid = *((long *)vargp);          /* Extract the thread ID */ 
    //用个ID决定要计算序列的区域
    long start = myid  * nelems_per_thread; /* Start element index */ 
    long end   = start + nelems_per_thread;  /* End element index */ 
    long i;

    //互斥保护求和
    for (i = start; i < end; i++) 
    {        
        sem_wait(&mutex);                  
	    gsum += i;                     
        sem_post(&mutex);                    
    }	                               
    return NULL;
}

改进一：避免同步

让每一个对等线程在一个私有变量中，计算它自己部分的和，这个私有变量不与其他任何进程共享，这样在对等线程中就不存在同步，唯一需要同步的是主线程必须等待所有子进程完成，再把psum向量元素加起来。

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <semaphore.h>

#define MAXTHREADS 32    

void *sum_array(void *vargp); /* Thread routine */

/* Global shared variables */
//每个对等线程的私有变量
long psum[MAXTHREADS];  /* Partial sum computed by each thread */   

long nelems_per_thread; /* Number of elements summed by each thread */

int main(int argc, char **argv) 
{
    long i, nelems, log_nelems, nthreads, myid[MAXTHREADS], result = 0;
    pthread_t tid[MAXTHREADS];

    /* Get input arguments */
    if (argc != 3) 
    { 
	    printf("Usage: %s <nthreads> <log_nelems>\n", argv[0]);
	    exit(0);
    }
    nthreads = atoi(argv[1]);
    log_nelems = atoi(argv[2]);
    nelems = (1L << log_nelems);

    /* Check input arguments */
    if  ((nelems % nthreads) != 0 || (log_nelems > 31)) 
    {
	    printf("Error: invalid nelems\n");
	    exit(0);
    }
    nelems_per_thread = nelems / nthreads;

    /* Create peer threads and wait for them to finish */
    for (i = 0; i < nthreads; i++) 
    {           
	    myid[i] = i;                               
	    pthread_create(&tid[i], NULL, sum_array, &myid[i]); 
    }                                                
    for (i = 0; i < nthreads; i++)               
	    pthread_join(tid[i], NULL);                  
    
    /* Add up the partial sums computed by each thread */
    //等所有子线程完成后，在加起来
    for (i = 0; i < nthreads; i++)                 
	    result += psum[i];                           

    /* Check final answer */
    if (result != (nelems * (nelems-1))/2)     
	    printf("Error: result=%ld\n", result); 
    else
        printf("OK: result=%ld\n", result); 

    exit(0);
}

/* $begin psumarraythread */
/* Thread routine for psum-array.c */
void *sum_array(void *vargp) 
{
    long myid = *((long *)vargp);          /* Extract the thread ID */ 
    long start = myid * nelems_per_thread; /* Start element index */ 
    long end = start + nelems_per_thread;  /* End element index */ 
    long i;

    //分别在私有变量中计算
    for (i = start; i < end; i++) 
    {        
	    psum[myid] += i;                   
    }	                                   
    return NULL;
}

改进二：使用局部变量而不是全部变量，消除不必要的内存引用

void *sum_local(void *vargp) 
{
    long myid = *((long *)vargp);          /* Extract the thread ID */ //line:conc:psumlocal:extractid
    long start = myid * nelems_per_thread; /* Start element index */ //line:conc:psumlocal:getstart
    long end = start + nelems_per_thread;  /* End element index */ //line:conc:psumlocal:getend
    long i, sum = 0;

    //使用局部变量而不是全部变量，消除不必要的内存引用
    for (i = start; i < end; i++) 
    {        
	    sum += i;                         
    }	                                  
    //加完后，再放如全局变量中
    psum[myid] = sum; 
    return NULL;
}

6.线程安全

一个函数是线程安全的，当且仅当被多个并发线程反复调用时，它会一直产生正确的结果。

主要有 4 类线程不安全的函数

不保护共享变量的函数

解决办法：使用 P 和 V semaphore 操作

问题：同步操作会影响性能

在多次调用间保存状态的函数

解决办法：把状态当做传入参数

返回指向静态变量的指针的函数

解决办法1：重写函数，传地址用以保存

解决办法2：上锁，并且进行复制

调用线程不安全函数的函数

解决办法：只调用线程安全的函数

7.竞争

如下例程，主线程创建四个对等线程，并传递唯一整数 i 的指针到每个线程，每个对等线程复制参数中传递的ID到局部变量中，然后每个线程分别输出包含这个 i 的信息，会得到以下不正确的结果：

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

#define N 4

void *thread(void *vargp);

int main() 
{
    pthread_t tid[N];
    int i;

    //创建四个对等线程
    for (i = 0; i < N; i++)    #第12行                    
	    pthread_create(&tid[i], NULL, thread, &i); //传递唯一整数 i 的指针到每个线程
    for (i = 0; i < N; i++) 
	    pthread_join(tid[i], NULL);
    exit(0);
}

void *thread(void *vargp) 
{
    int myid = *((int *)vargp);  #第22行
    printf("Hello from thread %d\n", myid); //每个线程分别输出包含这个 i 的信息
    return NULL;
}

linux> ./a.out 
Hello from thread 1
Hello from thread 2
Hello from thread 3
Hello from thread 2

原因是：竞争出现在在第12行对 i 加 1，和第22行参数间接引用和赋值之间。

因为对 i 的取值使用的是引用的方式，所以没有中间变量作为桥梁，i 当前取值会直接影响22行；

如果对等线程在第 12 行 i+1 之前就执行了 22 行，那myid就是正常的

否则，myid 就包含了其他线程的 i

因此这种竞争会导致输出结果却决于内核如何调度线程。

为了消除竞争，可以动态地为每个整数 i 分配一个独立的块（这样传递给对等线程的参数都有独立的地址）并把这个块的指针传递给对等线程。

在对等线程中，必须释放这个块，防止内存泄漏。

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

#define N 4

void *thread(void *vargp);

int main() 
{
    pthread_t tid[N];
    int i, *ptr;

    for (i = 0; i < N; i++) 
    {
        ptr = malloc(sizeof(int));      //动态地为每个整数 i 分配一个独立的块                
        *ptr = i;                                    
        pthread_create(&tid[i], NULL, thread, ptr); //把这个块的指针传递给对等线程
    }
    for (i = 0; i < N; i++) 
        pthread_join(tid[i], NULL);
    exit(0);
}

/* Thread routine */
void *thread(void *vargp) 
{
    int myid = *((int *)vargp);
    free(vargp);                //必须释放这个块，防止内存泄漏
    printf("Hello from thread %d\n", myid);
    return NULL;
}

消除竞争后，每个进程都能正确输出参数 i ：

linux> ./a.out 
Hello from thread 0
Hello from thread 2
Hello from thread 1
Hello from thread 3

8.死锁

指一个线程被阻塞了，等待一个永远也不会为真的条件。