| Traditional Unix systems copying the entire address space of the parent process | | | | |
| Modern Unix kernels solve this problem by introducing three different mechanisms: | | | | |
| Copy On Write | | | | | | | | |
| Lightweight processes | | | | | | | | |
| vfork( ) system call creates a process that shares the memory address space of its parent | | | |
| | | | | | | | | | |
| The clone( ), fork( ), and vfork( ) System Calls | | | | | | | |
| | | | | | | | | | |
| |
| fork( ) | vfork( ) | | | | | | |
| | | | | | | | | | |
| Clone flags | | | | | | | | | |
| 克隆 | sys_clone() | sys_fork( ) | sys_vfork | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | do_fork( ) | | | | | | | |
| The value returned by the system call is contained in eax: | | | | | |
| the value is 0 for the child and equal to the PID for the child's parent | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| clone(fn, arg, flags, child_stack, tls,ptid, ctid ) | | | | | | |
| Clone flags | | | | | | | | | |
| /* | | | | | | | | | | |
| * cloning flags: | | | | | | | | | |
| */ | | | | | | | | | | |
| #define CSIGNAL | 0x000000ff | /* signal mask to be sent at exit */ | | | | | |
| #define CLONE_VM | 0x00000100 | /* set if VM shared between processes */ | | | | | |
| #define CLONE_FS | 0x00000200 | /* set if fs info shared between processes */ | | | | | |
| #define CLONE_FILES | 0x00000400 | /* set if open files shared between processes */ | | | | |
| #define CLONE_SIGHAND | 0x00000800 | /* set if signal handlers and blocked signals shared */ | | | | |
| #define CLONE_IDLETASK | 0x00001000 | /* set if new pid should be 0 (kernel only)*/ | | | | | |
| #define CLONE_PTRACE | 0x00002000 | /* set if we want to let tracing continue on the child too */ | | | | |
| #define CLONE_VFORK | 0x00004000 | /* set if the parent wants the child to wake it up on mm_release */ | | | |
| #define CLONE_PARENT | 0x00008000 | /* set if we want to have the same parent as the cloner */ | | | | |
| #define CLONE_THREAD | 0x00010000 | /* Same thread group? */ | | | | | | |
| #define CLONE_NEWNS | 0x00020000 | /* New namespace group? */ | | | | | | |
| #define CLONE_SYSVSEM | 0x00040000 | /* share system V SEM_UNDO semantics */ | | | | | |
| #define CLONE_SETTLS | 0x00080000 | /* create a new TLS for the child */ | | | | | |
| #define CLONE_PARENT_SETTID | 0x00100000 | /* set the TID in the parent */ | | | | | | |
| #define CLONE_CHILD_CLEARTID | 0x00200000 | /* clear the TID in the child */ | | | | | | |
| #define CLONE_DETACHED | 0x00400000 | /* Not used - CLONE_THREAD implies detached uniquely */ | | | |
| #define CLONE_UNTRACED | 0x00800000 | /* set if the tracing process can't force CLONE_PTRACE on this clone */ | | |
| #define CLONE_CHILD_SETTID | 0x01000000 | /* set the TID in the child */ | | | | | | |
| #define CLONE_STOPPED | 0x02000000 | /* Start in stopped state */ | | | | | |
| | | | | | | | | | |
| do_fork( ) | | | | | | | | | | |
| 1,Allocates a new PID | | | | | | | | |
| 2,Checks the ptrace field of the parent (current->ptrace) | | | | | |
| 3,Invokes copy_process() | | | | | | | | |
| 4,clone flag check | | | | | | | | |
| | | | | | | | | | |
| copy_process( ) | | | | | | | | | |
| 1,clone_flags check | | | | | | | | |
| 2,security_task_create() | | | | | | | | |
| 3,dup_task_struct( ) | | | | | | | | |
| 4,check current->signal->rlim | | | | | | | |
| 5,Update user_struct | | | | | | | | |
| 6,Checks total number of processes of user | | | | | | |
| 7,Update usage counters | | | | | | | | |
| 8,Sets a few crucial fields related to the process state | | | | | |
| 9,Stores the PID of the new process in the tsk->pid field | | | | | |
| 10,CLONE_PARENT_SETTID flag in the clone_flags parameter is set | | | | |
| copies the child's PID into the User Mode variable addressed by the parent_tidptr parameter. | | |
| 11,Initializes the list_head data structures and the spin locks included in the child's process descriptor | | |
| 12,Invokes copy_semundo( ), copy_files( ), copy_fs( ), copy_sighand( ), copy_signal( ), copy_mm( ), and copy_namespace( ) |
| 13,Invokes copy_thread( ) to initialize the Kernel Mode stack of the child process | | | |
| with the values contained in the CPU registers when the clone( ) system call was issued | | |
| 14,copies the value of the child_tidptr parameter in the tsk->set_chid_tid or tsk->clear_child_tid field | | |
| 15,Turns off the TIF_SYSCALL_TRACE flag in the tHRead_info structure of the child | | | |
| 16,Initializes the tsk->exit_signal field with the signal number encoded in the low bits of the clone_flags parameter |
| 17,Invokes sched_fork( ) to complete the initialization of the scheduler data structure of the new process | |
| 18,Sets the cpu field in the thread_info structure of the new process | | | | |
| 19,Initializes the fields that specify the parenthood relationships | | | | | |
| 20,If the child does not need to be traced (CLONE_PTRACE flag not set), it sets the tsk->ptrace field to 0 | |
| 21,Executes the SET_LINKS macro to insert the new process descriptor in the process list | | |
| 22,If the child must be traced (PT_PTRACED flag in the tsk->ptrace field set), | | | | |
| it sets tsk->parent to current->parent and inserts the child into the trace list of the debugger | | |
| 23,Invokes attach_pid( ) to insert the PID of the new process descriptor in the pidhash[PIDTYPE_PID] hash table | |
| 24,If the child is a thread group leader (flag CLONE_THREAD cleared) | | | | |
| 25,Otherwise, if the child belongs to the thread group of its parent (CLONE_THREAD flag set): | | |
| 26,increases the value of the nr_threads variable | | | | | | |
| 26,Increases the total_forks variable to keep track of the number of forked processes | | | |
| 28,Terminates by returning the child's process descriptor pointer (tsk). | | | | |
| | | | | | | | | | |
| Kernel Threads | | | | | | | | | |
| kernel threads differ from regular processes in the following ways: | | | | |
| · Kernel threads run only in Kernel Mode | | | | | | |
| · Because kernel threads run only in Kernel Mode, they use only linear addresses greater than PAGE_OFFSET | |
| | | | | | | | | | |
| Creating a kernel thread | | | | | | | | |
| int kernel_thread(int (*fn)(void *), void * arg, unsigned long flags) | | | | |
| do_fork(flags|CLONE_VM|CLONE_UNTRACED, 0, pregs, 0, NULL, NULL); | | | | |
| | | | | | | | | | |
| The kernel_thread( ) function builds up this stack area so that: | | | | | |
| · The ebx and edx registers will be set by copy_thread() to the values of the parameters fn and arg, respectively | |
| · The eip register will be set to the address of the following assembly language fragment: | | | |
| · movl %edx,%eax | | | | | | | |
| · pushl %edx | | | | | | | |
| · call *%ebx | | | | | | | |
| · pushl %eax | | | | | | | |
| call do_exit | | | | | | | | |
| | | | | | | | | | |
| Process 0 | | | | | | | | | | |
| The ancestor of all processes, called process 0, the idle process, or, for historical reasons | | | |
| This ancestor process uses the following statically allocated data structures | | | | |
| (data structures for all other processes are dynamically allocated) | | | | |
| · A process descriptor stored in the init_task variable, which is initialized by the INIT_TASK macro | | |
| · A thread_info descriptor and a Kernel Mode stack | | | | | | |
| stored in the init_thread_union variable and initialized by the INIT_THREAD_INFO macro | | | |
| · The following tables, which the process descriptor points to | | | | | |
| o init_mm | | | | | | |
| o init_fs | | | | | | |
| o init_files | | | | | | |
| o init_signals | | | | | | |
| o init_sighand | | | | | | |
| The tables are initialized, respectively, by the following macros: | |
| o INIT_MM | | | | | | |
| o INIT_FS | | | | | | |
| o INIT_FILES | | | | | | |
| o INIT_SIGNALS | | | | | | |
| o INIT_SIGHAND | | | | | | |
| · The master kernel Page Global Directory stored in swapper_pg_dir | | | | |
| Process 0 is selected by the scheduler only when there are no other processes in the TASK_RUNNING state. |
| | | | | | | | | | |
| | | | | | | | | | |
| Process 1 | | | | | | | | | | |
| The start_kernel( ) function initializes all the data structures needed by the kernel, | | | |
| enables interrupts, and creates another kernel thread, named process 1 | | | | |
| kernel_thread(init, NULL, CLONE_FS|CLONE_SIGHAND); | | | | | |
| | | | | | | | | | |
| . Then init( ) invokes the execve( ) system call to load the executable program init. | | | |
| As a result, the init kernel thread becomes a regular process having its own per-process kernel data structure | |
| | | | | | | | | | |
| The init process stays alive until the system is shut down, | | | | | |
| because it creates and monitors the activity of all processes | | | | | |
| that implement the outer layers of the operating system. | | | | | |
| | | | | | | | | | |
| Other kernel threads | | | | | | | | | |
| A few examples of kernel threads (besides process 0 and process 1) are: | | | | |
| keventd (also called events) | | | | | | | |
| Executes the functions in the keventd_wq workqueue | | | | | |
| kapmd | | | | | | | | | |
| Handles the events related to the Advanced Power Management | | | | |
| kswapd | | | | | | | | | |
| Reclaims memory, as described in the section "Periodic Reclaiming" in Chapter 17 | | | |
| pdflush | | | | | | | | | |
| Flushes "dirty" buffers to disk to reclaim memory, as described in the section "The pdflush Kernel Threads" in Chapter 15 |
| kblockd | | | | | | | | | |
| Executes the functions in the kblockd_workqueue workqueue. Essentially, it periodically activates the block device drivers, as described in the section "Activating the Block Device Driver" in Chapter 14. |
| ksoftirqd | | | | | | | | | |
| Runs the tasklets (see section "Softirqs and Tasklets" in Chapter 4); there is one of these kernel threads for each CPU in the system |