c语言多进程flock,从flock引发的一个bug谈起（1）进程的文件描述符-优快云博客

本文介绍了在c语言中使用flock进行多进程文件锁时遇到的一个bug，该bug导致多个进程因等待锁而卡死。文章详细解释了flock的用法，包括其作为劝告锁的性质，以及在父子进程间如何传递和持有文件锁。通过一个具体的代码示例，展示了当父进程未关闭文件并释放锁就调用Python脚本时，导致子进程意外持有锁并引起问题的情况。此外，还探讨了Linux内核中关于文件描述符和文件锁的管理机制。

引子

前两天我们QA发现了一个比较有意思的bug，我细细分析一下，发现多个进程卡死在一个·配置文件上。简单的说，我们为了防止多个进程同时写同一个配置文件，将文件格式破坏，我们用了flock，对于写打开，同时调用flock 系统调用，LOCK_EX方式。当然了由于持有锁，就必须临界区要小，写完之后，尽量释放，持有锁的期间不要有time cost high 的操作，否则，会有其他进程获取不到文件锁，活活饿死。

这个bug比较有意思的地方是，大家都等锁的原因调用那个了一个python脚本，而这个脚本并不需要操作配置文件，仅仅是因为父进程system函数调用python脚本之前，没有关闭文件释放锁，导致python脚本很无辜的持有了这本锁，而python偏偏是个time cost high的操作，这就真是急中风偏偏遇到了慢郎中，外围一群进程焦急地等待这把锁，而python进程却占着毛坑不那啥，呵呵。

我们知道，linux存在强制锁(mandatory lock)和劝告锁(advisory lock)。所谓强制锁，比较好理解，就是你家大门上的那把锁，最要命的是只有一把钥匙，只有一个进程可以操作。所谓劝告锁，本质是一种协议，你访问文件前，先检查锁，这时候锁才其作用，如果你不那么kind，不管三七二十一，就要读写，那么劝告锁没有任何的作用。而遵守协议，读写前先检查锁的那些进程，叫做合作进程。我们代码用的是flock这种劝告锁。

Linux实现了POSIX规定的基于fcntl系统调用文件加锁机制，同时LINUX还支持BSD 变体的flock系统调用实现的劝告锁，当然system V变体的lockf也支持，大家可以自行查找手册。对于fcntl这个系统调用，大家可以阅读Stevens大神的UNIX网络编程卷2进程间通信，讲解的非常好。我的重点是flock。

应用层

flock的应用层接口如下

#include int flock(int fd, int operation);

其中fd是系统调用open返回的文件描述符，operation的选项有：

LOCK_SH ：共享锁

LOCK_EX ：排他锁或者独占锁

LOCK_UN :解锁。

事实上Linux内核也实现了LOCK_MAND选项，所然manual中没有提到。这种情况我们不讨论。

注意了，flock系统调用实现的FL_FLOCK类型的锁，本质是一种劝告锁，只有多个进程之间遵循要读写，先调锁的协议，才会生效。遵循协议的进程叫合作进程。

下面看一段代码：

#include#include#include #include #include #include #include #includeint main()

{

char buf[128];

time_t ltime;

int fd = open("./tmp.txt",O_RDWR);

if(fd < 0)

{

fprintf(stderr,"open failed %s\n",strerror(errno));

return -1;

}

int ret = flock(fd,LOCK_EX);

if(ret)

{

fprintf(stderr,"flock failed for father\n");

return -2;

}

else

{

time(&ltime);

fprintf(stderr,"%s I got the lock\n",ctime_r(&ltime,buf));

}

ret = fork();

if(ret == 0)

{

time(&ltime);

fprintf(stdout,"%s I am the son process,pid is %d,ppid = %d\n",ctime_r(&ltime,buf),getpid(),getppid());

write(fd,"write by son\n",32);

sleep(100);

time(&ltime);

fprintf(stdout,"%s son exit\n",ctime_r(&ltime,buf));

}

else if(ret > 0)

{

time(&ltime);

fprintf(stdout,"%s I am the father process,pid is %d\n",ctime_r(&ltime,buf),getpid());

write(fd,"write by father\n",32);

sleep(50);

close(fd);

time(&ltime);

fprintf(stdout, "%s father exit\n",ctime_r(&ltime,buf));

return 0;

}

else

{

fprintf(stderr, "error happened in fork\n");

return -3;

}

当然了，执行前，tmp.txt是存在的。我们写打开了一个文件tmp.txt，同时通过flock系统调用，申请了一把FL_FLOCK类型的锁然后fork了一个子进程。50秒后，父进程退出，子进程变成孤儿，100秒后，子进程退出。现在的问题是，父进程死去，子进程活着的期间，子进程是否持有这把锁？

我们让事实说话。先启动一个./test 5秒后启动另一个./test

这是第一个test所在的终端：

root@manu:~/code/c/self/flock# ./test

Wed Feb 6 23:53:29 2013

I got the lock

Wed Feb 6 23:53:29 2013

I am the father process,pid is 5632

Wed Feb 6 23:53:29 2013

I am the son process,pid is 5633,ppid = 5632

Wed Feb 6 23:54:19 2013

father exit

root@manu:~/code/c/self/flock# Wed Feb 6 23:55:09 2013

son exit

这是第二个test所在的终端：

root@manu:~/code/c/self/flock#

root@manu:~/code/c/self/flock# ./test

Wed Feb 6 23:55:09 2013

I got the lock

Wed Feb 6 23:55:09 2013

I am the father process,pid is 5634

Wed Feb 6 23:55:09 2013

I am the son process,pid is 5647,ppid = 5634

Wed Feb 6 23:55:59 2013

father exit

root@manu:~/code/c/self/flock# Wed Feb 6 23:56:49 2013

son exit

我们看到了，直到子进程退出，第二个启动的test的进程才申请到了这把FL_FLOCK锁。换言之，子进程会继承父进程的打开的所有文件，并且继承那把FL_FLOCK锁，哪怕他并不真正的操作这个文件。

BUT WHY ！！！

内核层之fd的分配

对于一个进程而言，我们知道有一个进程可以打开多个文件，ulimit -a我们可以看到，默认最多打开1024个文件。其中STDIN，STDOUT，STDERR是三个默认的，对应的文件描述符是0 ，1 ，2.进程用0 1 2 这种数字来表征对应的FILE，当然他们是特殊的文件，对于打开的某真正的文件，那么可能对应的fd 为4 ，操作系统是如何根据这个4找到对应的文件的呢？这是我们这个小节需要解决的问题。

struct task_struct {

...

struct files_struct *files;

...

}

struct files_struct {

atomic_t count;

struct fdtable __rcu *fdt;

struct fdtable fdtab;

spinlock_t file_lock ____cacheline_aligned_in_smp;

int next_fd;

unsigned long close_on_exec_init[1];

unsigned long open_fds_init[1];

struct file __rcu * fd_array[NR_OPEN_DEFAULT];

};

struct fdtable {

unsigned int max_fds;

struct file __rcu **fd; /* current fd array */

unsigned long *close_on_exec;

unsigned long *open_fds;

struct rcu_head rcu;

struct fdtable *next;

};

我给出了一陀数据结构，这些数据结构是进程和文件相关的数据结构，不要被吓倒，这部分的关系还是比较简单的，我们来看下，进程如何管理它打开的文件。真正的描述打开的文件信息的数据结构是:

struct file {

....

atomic_long_t f_count;

unsigned int f_flags;

fmode_t f_mode;

loff_t f_pos;

......

}

可以清楚的看到struct file 才是正主，记录文件了mode，当前读写的位置了，之类的信息。那么从进程，如何通过fd，找到这个对应的struct file的呢？这就用到了我们前面提到的一坨数据结构了。fdtable 是距离 file最近的数据结构，max_fds是目前支持的最多文件数。fd是一个file指针的指针，或者说file指针数组的基地址，这个数组包含有max_fds个file指针。比如我们上面的C程序，

tmp.txt对应的文件描述符是3,那么如何找到3对应的struct file 呢？很简单

fdtable->fd[3]这个指针指向的就是tmp.txt对应的struct file。

手握进程的fdtable，就能找到某个数字对应的struct file，那么如何从进程找到fdtable这个结构呢，也简单啊：

task_struct---->struct files_struct *files; ---->struct fdtable __rcu *fdt;

task_struct有一个struct file_struct类型的变量files，而file又有一个fdtable类型的成员变量fdt，那么给个文件描述符的数字(即open的返回值如3)我们就可以完成从task_struct找到对应的struct file。

struct fdtable 里面的close_on_exec和open_fds是干啥的呢，这两个是位图，每个bit标记对应位置上的文件描述符有没有分配出去。比如我们打开tmp.txt的时候，就去open_fds里面去查找，发现0位置出的bit为1,表示文件描述符0已经分配出去了，(of course，这个STDIN)，1位置出的bit值也是1(of course，这是STDOUT)，一路找来，发现第一个不是0的bit是3位置处，OK , 表示文件描述符3没有被占用，就将3作为open的返回值。

比较细心的看官可能要问了，这部所有的问题都解决了吗，为啥除了struct files_struct 除了一个struct fd_table指针，还有一个fd_table实例呢？这不多余吗？，还有其中的close_on_exec_init，open_fds_init都是神马玩意儿啊，成员变量fdt不是已经把事情都办得妥妥当当的了吗？如下

struct files_struct {

...

struct fdtable __rcu *fdt;

struct fdtable fdtab;

unsigned long close_on_exec_init[1];

unsigned long open_fds_init[1];

struct file __rcu * fd_array[NR_OPEN_DEFAULT];

}

其实fdt这个指针，一开始指向的是fdtable这个实例，fdt->open_fds指向的是open_fds_init，同理fdt->fd_array指向的就是files_struct中的fd_array。简单的说就是我家里有32个酒杯，如果来的客人少，那么直接用家里的32个酒杯就行了，很不幸，过一会第33个客人来了，那么家里的酒杯就不够了，我就给玄武饭店打了个电话，请帮我预留256个酒杯，我马上过去喝酒，然后我将所有的32个客人+新来的客人一起带到玄武饭店，用那里准备好的256个酒杯，当然暂时用不了这么多，但是我已经预先占下了。既然已经换了喝酒的地点，为了防止后来的客人找不到，必须将地点改为玄武饭店，就好像fdt不再指向files_struct 自带的fdtable，而指向新分配的数据结构。

进程创建之初，总是指向自家的那32个酒杯。代码中如何体现呢？

do_fork---->copy_process---->copy_files---->dup_fd

在dup_fd中如如下代码：

newf->next_fd = 0;

new_fdt = &newf->fdtab;

new_fdt->max_fds = NR_OPEN_DEFAULT;

new_fdt->close_on_exec = newf->close_on_exec_init;

new_fdt->open_fds = newf->open_fds_init;

new_fdt->fd = &newf->fd_array[0];

new_fdt->next = NULL;

对于fork出来的子进程来说，要拷贝父进程打开的所有文件，就好像子进程也打开了文件一样：可以用lsof验证之：

root@manu:~/code/c/self/flock# ./test &

[1] 6226

root@manu:~/code/c/self/flock# Fri Feb 8 00:03:29 2013

I got the lock

Fri Feb 8 00:03:29 2013

I am the father process,pid is 6226

Fri Feb 8 00:03:29 2013

I am the son process,pid is 6227,ppid = 6226

root@manu:~/code/c/self/flock# lsof -p 6226

COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME

...

test 6226 root 0u CHR 136,2 0t0 5 /dev/pts/2

test 6226 root 1u CHR 136,2 0t0 5 /dev/pts/2

test 6226 root 2u CHR 136,2 0t0 5 /dev/pts/2

test 6226 root 3uW REG 8,6 321 2359759 /home/manu/code/c/self/flock/tmp.txt

root@manu:~/code/c/self/flock# lsof -p 6226

COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME

...

test 6227 root 0u CHR 136,2 0t0 5 /dev/pts/2

test 6227 root 1u CHR 136,2 0t0 5 /dev/pts/2

test 6227 root 2u CHR 136,2 0t0 5 /dev/pts/2

test 6227 root 3u REG 8,6 321 2359759 /home/manu/code/c/self/flock/tmp.txt

对于父进程宴请的宾客个数(打开的文件)比较多，超过了32个，那么子进程会判断父进程准备的最大酒杯数，如果超过了32,得，刚才白忙乎了，还的去申请新的酒杯。注意，只是struct file的指针被拷贝，父进程的bitmap被拷贝，真正的struct file这个比较大的结构体并没有被拷贝一份。

old_fds = old_fdt->fd;

new_fds = new_fdt->fd;

/*拷贝位图信息*/

memcpy(new_fdt->open_fds, old_fdt->open_fds, open_files / 8);

memcpy(new_fdt->close_on_exec, old_fdt->close_on_exec, open_files / 8);

/*拷贝打开的file对应的struct file指针*/

for (i = open_files; i != 0; i--) {

struct file *f = *old_fds++;

if (f) {

get_file(f);/*增加文件的引用计数，多了一个进程持有该strcut file */

} else {

* The fd may be claimed in the fd bitmap but not yet

* instantiated in the files array if a sibling thread

* is partway through open(). So make sure that this

* fd is available to the new process.

__clear_open_fd(open_files - i, new_fdt);

}

rcu_assign_pointer(*new_fds++, f);

}

参考文献

深入理解Linux内核

深入linux内核架构

Linux kernel code 3.6.7

c语言多进程flock,从flock引发的一个bug谈起（1） 进程的文件描述符

c语言多进程flock,从flock引发的一个bug谈起（1）进程的文件描述符