FreeBSD 进程创建 + ELF 格式 + 进程0，进程1的内核源码分析

最新推荐文章于 2025-06-01 15:50:41 发布

array118

最新推荐文章于 2025-06-01 15:50:41 发布

阅读量3.2k

点赞数

CC 4.0 BY-SA版权

分类专栏：进程创建文章标签： FreeBSD 进程创建

本文链接：https://blog.youkuaiyun.com/array118/article/details/96839651

前言

本文对进程创建涉及的内核动作加以分析，希望对这方面感兴趣的各位有帮助

正传

在FreeBSD系统里面关于创建进程的系统调用有： (kern_fork.c)
fork, rfork, vfork 对应于内核的是实现是：sys_fork, sys_rfork, sys_vfork, 其实这三个内核实现都是以不同的flag来调用 fork1 其中与内存有关的较重要的函数之一是 vmspace_fork

vfork 是很特殊的fork，子进程完全和父进程共享空间，然后还需要父进程挂起等待子进程完成工作，有结构缺陷是子进程可以随意改动父进程的地址空间，如果出错可能会搞崩父进程；

sys_fork, sys_vfork, sys_rfork ----------> fork1------>vmspace_fork; do_fork ------>vm_forkproc;

fork1 里面，如果是vfork调进来的则 RFMEM （share address space）是置上的,所以在 vm_forkproc 里面才会有如下语句：
if (flags & RFMEM) {
p2->p_vmspace = p1->p_vmspace; -------------------> p2 子进程与p1父进程完全公用1个地址空间（vmspace）；
atomic_add_int(&p1->p_vmspace->vm_refcnt, 1); ---------> vmspace 在申请出来时计数是1，如果有被share的时候就 +1
}

而常规的fork (sys_fork (也是在kern_fork.c里面)) 就会在fork1 里面有如下语句来创建vmspace:
if ((flags & RFMEM) == 0) {
vm2 = vmspace_fork(p1->p_vmspace, &mem_charged); ---------->以父进程p1的vmspace为蓝本创建子进程的vmspace; vm_forkproc 主要做两件事1，把vmspace 挂接到进程上。2，调用cpu_fork；
if (vm2 == NULL) {
error = ENOMEM;
goto fail1;
}
} else { --------------->vfork 和父进程共用地址空间；
vm2 = NULL;
}

所以在创建进程时除了vfork直接沿用父进程外，大多数流程在创建地址空间时 vmspace_fork 函数是绝对的核心函数：
/*

vmspace_fork:
Create a new process vmspace structure and vm_map
based on those of an existing process. The new map
is based on the old map, according to the inheritance
values on the regions in that map.
XXX It might be worth coalescing the entries added to the new vmspace.
The source map must not be locked.
*/
struct vmspace *
vmspace_fork(struct vmspace *vm1, vm_ooffset_t *fork_charge)
{
struct vmspace *vm2;
vm_map_t new_map, old_map;
vm_map_entry_t new_entry, old_entry;
vm_object_t object;
int locked;

old_map = &vm1->vm_map;
/* Copy immutable fields of vm1 to vm2. /
vm2 = vmspace_alloc(old_map->min_offset, old_map->max_offset, NULL);
if (vm2 == NULL)
return (NULL);
vm2->vm_taddr = vm1->vm_taddr;
vm2->vm_daddr = vm1->vm_daddr;
vm2->vm_maxsaddr = vm1->vm_maxsaddr;
vm_map_lock(old_map);
if (old_map->busy)
vm_map_wait_busy(old_map);
new_map = &vm2->vm_map;
locked = vm_map_trylock(new_map); / trylock to silence WITNESS */
KASSERT(locked, (“vmspace_fork: lock failed”));

old_entry = old_map->header.next;

while (old_entry != &old_map->header) { --------------------> old_xxx 代表父进程，此循环代表遍历父进程的所有地址空间vm_map {entry1, entry2, entry3, entry4}
if (old_entry->eflags & MAP_ENTRY_IS_SUB_MAP)
panic(“vm_map_fork: encountered a submap”);
/*
遍历父进程遵循的原则可以从FreeBSD 操作系统设计与实现第二版的6.6（对应第一版的5.6.2）中找到，摘抄如下：
Using copy-on-write for fork is done by traversing the list of vm_map_entry structures in the
parent and creating a corresponding entry in the child. Each entry must be analyzed and the
appropriate action taken:
• If the entry maps a shared region, the child can take a reference to it.
• If the entry maps a privately mapped region (such as the data area or stack), the child must
create a copy-on-write mapping of the region. The parent must be converted to a copy-on-write
mapping of the region. If either process later tries to write the region, it will create a shadow
object to hold the modified pages.
With the virtual-memory resources allocated, the system sets up the kernel-and user-mode state
of the new process. It then clears the NEW flag and places the process’s thread on the run queue;
the new process can then begin execution.

中文翻译如下：

/
/ vmspace_fork 函数继续 -----------------------------------*/
switch (old_entry->inheritance) {
case VM_INHERIT_NONE:
break;

    case VM_INHERIT_SHARE: ----------------->对应上面文字说明的只读或者共享区域；
        /*
         * Clone the entry, creating the shared object if necessary.
         */
        object = old_entry->object.vm_object;
        if (object == NULL) {
            object = vm_object_allocate(OBJT_DEFAULT,
                atop(old_entry->end - old_entry->start));
            old_entry->object.vm_object = object;
            old_entry->offset = 0;
            if (old_entry->cred != NULL) {
                object->cred = old_entry->cred;
                object->charge = old_entry->end -
                    old_entry->start;
                old_entry->cred = NULL;
            }
        }

        /*
         * Add the reference before calling vm_object_shadow
         * to insure that a shadow object is created.
         */
        vm_object_reference(object);
        if (old_entry->eflags & MAP_ENTRY_NEEDS_COPY) {
            vm_object_shadow(&old_entry->object.vm_object,
                &old_entry->offset,
                old_entry->end - old_entry->start);
            old_entry->eflags &= ~MAP_ENTRY_NEEDS_COPY;
            /* Transfer the second reference too. */
            vm_object_reference(
                old_entry->object.vm_object);

            /*
             * As in vm_map_simplify_entry(), the
             * vnode lock will not be acquired in
             * this call to vm_object_deallocate().
             */
            vm_object_deallocate(object);
            object = old_entry->object.vm_object;
        }
        VM_OBJECT_WLOCK(object);
        vm_object_clear_flag(object, OBJ_ONEMAPPING);
        if (old_entry->cred != NULL) {
            KASSERT(object->cred == NULL, ("vmspace_fork both cred"));
            object->cred = old_entry->cred;
            object->charge = old_entry->end - old_entry->start;
            old_entry->cred = NULL;
        }

        /*
         * Assert the correct state of the vnode
         * v_writecount while the object is locked, to
         * not relock it later for the assertion
         * correctness.
         */
        if (old_entry->eflags & MAP_ENTRY_VN_WRITECNT &&
            object->type == OBJT_VNODE) {
            KASSERT(((struct vnode *)object->handle)->
                v_writecount > 0,
                ("vmspace_fork: v_writecount %p", object));
            KASSERT(object->un_pager.vnp.writemappings > 0,
                ("vmspace_fork: vnp.writecount %p",
                object));
        }
        VM_OBJECT_WUNLOCK(object);

        /*
         * Clone the entry, referencing the shared object.
         */
        new_entry = vm_map_entry_create(new_map);
        *new_entry = *old_entry;       ------------------>虚拟空间复制
        new_entry->eflags &= ~(MAP_ENTRY_USER_WIRED |
            MAP_ENTRY_IN_TRANSITION);
        new_entry->wiring_thread = NULL;
        new_entry->wired_count = 0;
        if (new_entry->eflags & MAP_ENTRY_VN_WRITECNT) {
            vnode_pager_update_writecount(object,
                new_entry->start, new_entry->end);
        }

        /*
         * Insert the entry into the new map -- we know we're
         * inserting at the end of the new map.
         */
        vm_map_entry_link(new_map, new_map->header.prev,
            new_entry);
        vmspace_map_entry_forked(vm1, vm2, new_entry);

        /*
         * Update the physical map
         */
        pmap_copy(new_map->pmap, old_map->pmap, ------------------>页表复制
            new_entry->start,
            (old_entry->end - old_entry->start),
            old_entry->start);
        break;

    case VM