分析Linux内核创建一个新进程的过程

最新推荐文章于 2021-05-12 07:10:47 发布

原创最新推荐文章于 2021-05-12 07:10:47 发布 · 986 阅读

0 ·

CC 4.0 BY-SA版权

本文详细剖析了Linux内核创建新进程的过程，通过使用fork函数实现子进程的创建，并深入探讨了do_fork、copy_process和wake_up_new_task等关键函数的作用。文章最后总结了创建进程的整体流程。

徐晨 + 原创作品转载请注明出处 + 《Linux内核分析》MOOC课程http://mooc.study.163.com/course/USTC-1000029000

今天我们讨论一下linux内核如果创建一个新进程的过程，这里使用fork来创建一个子进程，代码如下所示：

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main(int argc, char * argv[])
{
    int pid;
    /* fork another process */
    pid = fork();
    if (pid < 0) 
    { 
        /* error occurred */
        fprintf(stderr,"Fork Failed!");
        exit(-1);
    } 
    else if (pid == 0) 
    {
        /* child process */
        printf("This is Child Process!\n");
    } 
    else 
    {  
        /* parent process  */
        printf("This is Parent Process!\n");
        /* parent will wait for the child to complete*/
        wait(NULL);
        printf("Child Complete!\n");
    }
}

我们调用C library中的fork()实际上封装了sys_clone()内核函数，而sys_clone又调用了内核程序do_fork(),所以我们知道如下的调用关系:

libc fork() -> system_call -> sys_clone() -> do_fork()

内核代码中可以看到do_fork()函数的注释写到：“Ok, this is the main fork-routine.” 那么我们看看这个主要过程是如何进行的，将该函数内容进行抽象，这个函数主要调用了两个函数，copy_process和wake_up_new_task：

long do_fork() 
{      :
  p = copy_process(); 
       :
  wake_up_new_task(p, clone_flags);
       :
}

copy_process的注释中写到：“This creates a new process as a copy of the old one, but does not actually start it yet.”

wake_up_new_task注释中写到：“wake up a newly created task for the first time.”
这样一来，我们就大概知道，这个do_fork函数实际上用copy_process来拷贝出一个新的进程pcb，然后调用wake_up_new_task将新的进程放入运行队列并唤醒该进程。

这时候我们分析一下copy_process():

copy_process()
{
    p = dup_task_struct(current) //为新进程分配一个新的内核堆栈，复制了thread_info
    copy_*   // 这一系列的拷贝函数为pcb复制了一些指针数据，tsk->files，tsk->fs，等等
    copy_thread () // 该函数拷贝和体系结构相关的进程执行状态，寄存器，指令指针ip等等
}

在copy_thread函数中，子进程的内核堆栈被初始化，进程的sp,ip信息也是在这时被写入，我们在第一次实验my_kernel中得知，0号进程的ip被我们设置成即将执行的函数的入口地址，而在这里，ip被我们设置为汇编函数ret_from_fork，这是子进程的执行的起点;也是在这个函数中，寄存器eax被改写为0，这就是fork出的子进程id=0的原因。

在这里我们应该注意，新进程执行起点时的内核堆栈中以下基本信息，pt_regs只包含了内核堆栈最底层的一些内容。

struct pt_regs {
    unsigned long bx;
    unsigned long cx;
    unsigned long dx;
    unsigned long si;
    unsigned long di;
    unsigned long bp;
    unsigned long ax;      
    unsigned long ds;
    unsigned long es;
    unsigned long fs;
    unsigned long gs;
    unsigned long orig_ax; 
    unsigned long ip; 
    unsigned long cs;
    unsigned long flags;
    unsigned long sp;
    unsigned long ss;
};

我们在这里需要注意的是，新的进程在返回用户态之前，首先要进行进程的调度，wake_up_new_task将这个新进程置为运行态，并且将其放入运行队列中，但这个进程只有在真正上CPU时，才会从ret_from_fork开始执行。我们看一下这个汇编函数：

ENTRY(ret_from_fork)
    CFI_STARTPROC
    pushl_cfi %eax
    call schedule_tail
    GET_THREAD_INFO(%ebp)
    popl_cfi %eax
    pushl_cfi $0x0202      # Reset kernel eflags
    popfl_cfi
    jmp syscall_exit
    CFI_ENDPROC
END(ret_from_fork)

我们看到这段代码调用schedule_tail()函数，以及syscall_exit，用来恢复现场并返回用户态空间。

我们用GDB跟踪一下看看结果：

Breakpoint 2, do_fork (clone_flags=18874385, stack_start=0, stack_size=0, 
    parent_tidptr=0x0, child_tidptr=0x96cb8a8) at kernel/fork.c:1628
1628    {
(gdb) 
Continuing.

Breakpoint 3, do_fork (clone_flags=18874385, stack_start=0, stack_size=0, 
    parent_tidptr=0x0, child_tidptr=0x96cb8a8) at kernel/fork.c:1651
1651        p = copy_process(clone_flags, stack_start, stack_size,
(gdb) 
Continuing.

Breakpoint 3, copy_process (clone_flags=18874385, stack_start=0, stack_size=0, 
    child_tidptr=0x96cb8a8, pid=0x0, trace=0) at kernel/fork.c:1182
1182    static struct task_struct *copy_process(unsigned long clone_flags,
(gdb) 
Continuing.

Breakpoint 4, copy_process (clone_flags=<optimized out>, 
    stack_start=<optimized out>, stack_size=<optimized out>, 
    child_tidptr=0x96cb8a8, pid=0x0, trace=0) at kernel/fork.c:1240
1240        p = dup_task_struct(current);
(gdb) 
Continuing.

Breakpoint 5, copy_thread (clone_flags=18874385, sp=0, arg=0, p=0xc70c0bc0)
    at arch/x86/kernel/process_32.c:134
134 {
(gdb) 
Continuing.

Breakpoint 6, wake_up_new_task (p=0xc70c0bc0) at kernel/sched/core.c:2117
2117    {
(gdb) 
Continuing.

Breakpoint 7, ret_from_fork () at arch/x86/kernel/entry_32.S:292
292     pushl_cfi %eax
(gdb) s
293     call schedule_tail
(gdb) n
294     GET_THREAD_INFO(%ebp)
(gdb) s
295     popl_cfi %eax
(gdb) 
296     pushl_cfi $0x0202      # Reset kernel eflags
(gdb) 
297     popfl_cfi
(gdb) 
298     jmp syscall_exit
(gdb) 

接下来跟踪不到了。

这里留了一个问题暂时没搞明白，wake_up_new_task将子进程设置为了运行态，并将其放在了队列上，这时候到底有没有发生调度，如果这里已经调度了，那ret_from_fork就进行了第二次调度？因为从系统调用返回的时候是一个调度的时机。这个问题先放一下，回头补上答案。

总结一下Linux创建一个新进程的过程：
libc fork() -> system_call -> sys_clone() -> do_fork() -> copy_process() {dup_task_struct; copy_thread } -> wake_up_new_task() -> ret_from_fork

这次分析就到这里。
注：这次是用MarkDown写的，所以没有贴图，将gdb输出拷贝了下来