前言
本文对进程创建涉及的内核动作加以分析,希望对这方面感兴趣的各位有帮助
正传
在FreeBSD系统里面关于创建进程的系统调用有: (kern_fork.c)
fork, rfork, vfork 对应于内核的是实现是:sys_fork, sys_rfork, sys_vfork, 其实这三个内核实现都是以不同的flag来调用 fork1 其中与内存有关的较重要的函数之一是 vmspace_fork
vfork 是很特殊的fork, 子进程完全和父进程共享空间, 然后还需要父进程挂起等待子进程完成工作, 有结构缺陷是子进程可以随意改动父进程的地址空间,如果出错可能会搞崩父进程;
sys_fork, sys_vfork, sys_rfork ----------> fork1------>vmspace_fork; do_fork ------>vm_forkproc;
fork1 里面, 如果是vfork调进来的则 RFMEM (share address space)是置上的,所以在 vm_forkproc 里面才会有 如下语句:
if (flags & RFMEM) {
p2->p_vmspace = p1->p_vmspace; -------------------> p2 子进程与p1父进程完全公用1个地址空间(vmspace);
atomic_add_int(&p1->p_vmspace->vm_refcnt, 1); ---------> vmspace 在申请出来时计数是1, 如果有被share的时候 就 +1
}
而常规的fork (sys_fork (也是在kern_fork.c里面)) 就会在fork1 里面有如下语句来创建vmspace:
if ((flags & RFMEM) == 0) {
vm2 = vmspace_fork(p1->p_vmspace, &mem_charged); ---------->以父进程p1的vmspace为蓝本 创建子进程的vmspace; vm_forkproc 主要做两件事1, 把vmspace 挂接到进程上。2,调用cpu_fork;
if (vm2 == NULL) {
error = ENOMEM;
goto fail1;
}
} else { --------------->vfork 和父进程共用地址空间;
vm2 = NULL;
}
所以在创建进程时除了vfork直接沿用父进程外, 大多数流程在创建地址空间时 vmspace_fork 函数是绝对的核心函数:
/*
-
vmspace_fork:
-
Create a new process vmspace structure and vm_map
-
based on those of an existing process. The new map
-
is based on the old map, according to the inheritance
-
values on the regions in that map.
-
XXX It might be worth coalescing the entries added to the new vmspace.
-
The source map must not be locked.
*/
struct vmspace *
vmspace_fork(struct vmspace *vm1, vm_ooffset_t *fork_charge)
{
struct vmspace *vm2;
vm_map_t new_map, old_map;
vm_map_entry_t new_entry, old_entry;
vm_object_t object;
int locked;old_map = &vm1->vm_map;
/* Copy immutable fields of vm1 to vm2. /
vm2 = vmspace_alloc(old_map->min_offset, old_map->max_offset, NULL);
if (vm2 == NULL)
return (NULL);
vm2->vm_taddr = vm1->vm_taddr;
vm2->vm_daddr = vm1->vm_daddr;
vm2->vm_maxsaddr = vm1->vm_maxsaddr;
vm_map_lock(old_map);
if (old_map->busy)
vm_map_wait_busy(old_map);
new_map = &vm2->vm_map;
locked = vm_map_trylock(new_map); / trylock to silence WITNESS */
KASSERT(locked, (“vmspace_fork: lock failed”));old_entry = old_map->header.next;
while (old_entry != &old_map->header) { --------------------> old_xxx 代表父进程, 此循环代表遍历父进程的所有地址空间vm_map {entry1, entry2, entry3, entry4}
if (old_entry->eflags & MAP_ENTRY_IS_SUB_MAP)
panic(“vm_map_fork: encountered a submap”);
/*
遍历父进程遵循的原则可以从FreeBSD 操作系统设计与实现 第二版的6.6(对应第一版的5.6.2)中找到, 摘抄如下:
Using copy-on-write for fork is done by traversing the list of vm_map_entry structures in the
parent and creating a corresponding entry in the child. Each entry must be analyzed and the
appropriate action taken:
• If the entry maps a shared region, the child can take a reference to it.
• If the entry maps a privately mapped region (such as the data area or stack), the child must
create a copy-on-write mapping of the region. The parent must be converted to a copy-on-write
mapping of the region. If either process later tries to write the region, it will create a shadow
object to hold the modified pages.
With the virtual-memory resources allocated, the system sets up the kernel-and user-mode state
of the new process. It then clears the NEW flag and places the process’s thread on the run queue;
the new process can then begin execution.
中文翻译如下:
/
/ vmspace_fork 函数继续 -----------------------------------*/
switch (old_entry->inheritance) {
case VM_INHERIT_NONE:
break;
case VM_INHERIT_SHARE: ----------------->对应上面文字说明的只读或者共享区域;
/*
* Clone the entry, creating the shared object if necessary.
*/
object = old_entry->object.vm_object;
if (object == NULL) {
object = vm_object_allocate(OBJT_DEFAULT,
atop(old_entry->end - old_entry->start));
old_entry->object.vm_object = object;
old_entry->offset = 0;
if (old_entry->cred != NULL) {
object->cred = old_entry->cred;
object->charge = old_entry->end -
old_entry->start;
old_entry->cred = NULL;
}
}
/*
* Add the reference before calling vm_object_shadow
* to insure that a shadow object is created.
*/
vm_object_reference(object);
if (old_entry->eflags & MAP_ENTRY_NEEDS_COPY) {
vm_object_shadow(&old_entry->object.vm_object,
&old_entry->offset,
old_entry->end - old_entry->start);
old_entry->eflags &= ~MAP_ENTRY_NEEDS_COPY;
/* Transfer the second reference too. */
vm_object_reference(
old_entry->object.vm_object);
/*
* As in vm_map_simplify_entry(), the
* vnode lock will not be acquired in
* this call to vm_object_deallocate().
*/
vm_object_deallocate(object);
object = old_entry->object.vm_object;
}
VM_OBJECT_WLOCK(object);
vm_object_clear_flag(object, OBJ_ONEMAPPING);
if (old_entry->cred != NULL) {
KASSERT(object->cred == NULL, ("vmspace_fork both cred"));
object->cred = old_entry->cred;
object->charge = old_entry->end - old_entry->start;
old_entry->cred = NULL;
}
/*
* Assert the correct state of the vnode
* v_writecount while the object is locked, to
* not relock it later for the assertion
* correctness.
*/
if (old_entry->eflags & MAP_ENTRY_VN_WRITECNT &&
object->type == OBJT_VNODE) {
KASSERT(((struct vnode *)object->handle)->
v_writecount > 0,
("vmspace_fork: v_writecount %p", object));
KASSERT(object->un_pager.vnp.writemappings > 0,
("vmspace_fork: vnp.writecount %p",
object));
}
VM_OBJECT_WUNLOCK(object);
/*
* Clone the entry, referencing the shared object.
*/
new_entry = vm_map_entry_create(new_map);
*new_entry = *old_entry; ------------------>虚拟空间复制
new_entry->eflags &= ~(MAP_ENTRY_USER_WIRED |
MAP_ENTRY_IN_TRANSITION);
new_entry->wiring_thread = NULL;
new_entry->wired_count = 0;
if (new_entry->eflags & MAP_ENTRY_VN_WRITECNT) {
vnode_pager_update_writecount(object,
new_entry->start, new_entry->end);
}
/*
* Insert the entry into the new map -- we know we're
* inserting at the end of the new map.
*/
vm_map_entry_link(new_map, new_map->header.prev,
new_entry);
vmspace_map_entry_forked(vm1, vm2, new_entry);
/*
* Update the physical map
*/
pmap_copy(new_map->pmap, old_map->pmap, ------------------>页表复制
new_entry->start,
(old_entry->end - old_entry->start),
old_entry->start);
break;
case VM_INHERIT_COPY: ----------------->对应上面文字说明的是私有映射区域(如数据端和堆栈);
/*
* Clone the entry and link into the map.
*/
new_entry = vm_map_entry_create(new