FreeBSD 进程创建 + ELF 格式 + 进程0,进程1的内核源码分析

前言

本文对进程创建涉及的内核动作加以分析,希望对这方面感兴趣的各位有帮助

正传

在FreeBSD系统里面关于创建进程的系统调用有: (kern_fork.c)
fork, rfork, vfork 对应于内核的是实现是:sys_fork, sys_rfork, sys_vfork, 其实这三个内核实现都是以不同的flag来调用 fork1 其中与内存有关的较重要的函数之一是 vmspace_fork

vfork 是很特殊的fork, 子进程完全和父进程共享空间, 然后还需要父进程挂起等待子进程完成工作, 有结构缺陷是子进程可以随意改动父进程的地址空间,如果出错可能会搞崩父进程;

sys_fork, sys_vfork, sys_rfork ----------> fork1------>vmspace_fork; do_fork ------>vm_forkproc;

fork1 里面, 如果是vfork调进来的则 RFMEM (share address space)是置上的,所以在 vm_forkproc 里面才会有 如下语句:
if (flags & RFMEM) {
p2->p_vmspace = p1->p_vmspace; -------------------> p2 子进程与p1父进程完全公用1个地址空间(vmspace);
atomic_add_int(&p1->p_vmspace->vm_refcnt, 1); ---------> vmspace 在申请出来时计数是1, 如果有被share的时候 就 +1
}

而常规的fork (sys_fork (也是在kern_fork.c里面)) 就会在fork1 里面有如下语句来创建vmspace:
if ((flags & RFMEM) == 0) {
vm2 = vmspace_fork(p1->p_vmspace, &mem_charged); ---------->以父进程p1的vmspace为蓝本 创建子进程的vmspace; vm_forkproc 主要做两件事1, 把vmspace 挂接到进程上。2,调用cpu_fork;
if (vm2 == NULL) {
error = ENOMEM;
goto fail1;
}
} else { --------------->vfork 和父进程共用地址空间;
vm2 = NULL;
}

所以在创建进程时除了vfork直接沿用父进程外, 大多数流程在创建地址空间时 vmspace_fork 函数是绝对的核心函数:
/*

  • vmspace_fork:

  • Create a new process vmspace structure and vm_map

  • based on those of an existing process. The new map

  • is based on the old map, according to the inheritance

  • values on the regions in that map.

  • XXX It might be worth coalescing the entries added to the new vmspace.

  • The source map must not be locked.
    */
    struct vmspace *
    vmspace_fork(struct vmspace *vm1, vm_ooffset_t *fork_charge)
    {
    struct vmspace *vm2;
    vm_map_t new_map, old_map;
    vm_map_entry_t new_entry, old_entry;
    vm_object_t object;
    int locked;

    old_map = &vm1->vm_map;
    /* Copy immutable fields of vm1 to vm2. /
    vm2 = vmspace_alloc(old_map->min_offset, old_map->max_offset, NULL);
    if (vm2 == NULL)
    return (NULL);
    vm2->vm_taddr = vm1->vm_taddr;
    vm2->vm_daddr = vm1->vm_daddr;
    vm2->vm_maxsaddr = vm1->vm_maxsaddr;
    vm_map_lock(old_map);
    if (old_map->busy)
    vm_map_wait_busy(old_map);
    new_map = &vm2->vm_map;
    locked = vm_map_trylock(new_map); /
    trylock to silence WITNESS */
    KASSERT(locked, (“vmspace_fork: lock failed”));

    old_entry = old_map->header.next;

    while (old_entry != &old_map->header) { --------------------> old_xxx 代表父进程, 此循环代表遍历父进程的所有地址空间vm_map {entry1, entry2, entry3, entry4}
    if (old_entry->eflags & MAP_ENTRY_IS_SUB_MAP)
    panic(“vm_map_fork: encountered a submap”);
    /*
    遍历父进程遵循的原则可以从FreeBSD 操作系统设计与实现 第二版的6.6(对应第一版的5.6.2)中找到, 摘抄如下:
    Using copy-on-write for fork is done by traversing the list of vm_map_entry structures in the
    parent and creating a corresponding entry in the child. Each entry must be analyzed and the
    appropriate action taken:
    • If the entry maps a shared region, the child can take a reference to it.
    • If the entry maps a privately mapped region (such as the data area or stack), the child must
    create a copy-on-write mapping of the region. The parent must be converted to a copy-on-write
    mapping of the region. If either process later tries to write the region, it will create a shadow
    object to hold the modified pages.
    With the virtual-memory resources allocated, the system sets up the kernel-and user-mode state
    of the new process. It then clears the NEW flag and places the process’s thread on the run queue;
    the new process can then begin execution.

中文翻译如下:

/
/
vmspace_fork 函数继续 -----------------------------------*/
switch (old_entry->inheritance) {
case VM_INHERIT_NONE:
break;

    case VM_INHERIT_SHARE: ----------------->对应上面文字说明的只读或者共享区域;
        /*
         * Clone the entry, creating the shared object if necessary.
         */
        object = old_entry->object.vm_object;
        if (object == NULL) {
            object = vm_object_allocate(OBJT_DEFAULT,
                atop(old_entry->end - old_entry->start));
            old_entry->object.vm_object = object;
            old_entry->offset = 0;
            if (old_entry->cred != NULL) {
                object->cred = old_entry->cred;
                object->charge = old_entry->end -
                    old_entry->start;
                old_entry->cred = NULL;
            }
        }

        /*
         * Add the reference before calling vm_object_shadow
         * to insure that a shadow object is created.
         */
        vm_object_reference(object);
        if (old_entry->eflags & MAP_ENTRY_NEEDS_COPY) {
            vm_object_shadow(&old_entry->object.vm_object,
                &old_entry->offset,
                old_entry->end - old_entry->start);
            old_entry->eflags &= ~MAP_ENTRY_NEEDS_COPY;
            /* Transfer the second reference too. */
            vm_object_reference(
                old_entry->object.vm_object);

            /*
             * As in vm_map_simplify_entry(), the
             * vnode lock will not be acquired in
             * this call to vm_object_deallocate().
             */
            vm_object_deallocate(object);
            object = old_entry->object.vm_object;
        }
        VM_OBJECT_WLOCK(object);
        vm_object_clear_flag(object, OBJ_ONEMAPPING);
        if (old_entry->cred != NULL) {
            KASSERT(object->cred == NULL, ("vmspace_fork both cred"));
            object->cred = old_entry->cred;
            object->charge = old_entry->end - old_entry->start;
            old_entry->cred = NULL;
        }

        /*
         * Assert the correct state of the vnode
         * v_writecount while the object is locked, to
         * not relock it later for the assertion
         * correctness.
         */
        if (old_entry->eflags & MAP_ENTRY_VN_WRITECNT &&
            object->type == OBJT_VNODE) {
            KASSERT(((struct vnode *)object->handle)->
                v_writecount > 0,
                ("vmspace_fork: v_writecount %p", object));
            KASSERT(object->un_pager.vnp.writemappings > 0,
                ("vmspace_fork: vnp.writecount %p",
                object));
        }
        VM_OBJECT_WUNLOCK(object);

        /*
         * Clone the entry, referencing the shared object.
         */
        new_entry = vm_map_entry_create(new_map);
        *new_entry = *old_entry;       ------------------>虚拟空间复制
        new_entry->eflags &= ~(MAP_ENTRY_USER_WIRED |
            MAP_ENTRY_IN_TRANSITION);
        new_entry->wiring_thread = NULL;
        new_entry->wired_count = 0;
        if (new_entry->eflags & MAP_ENTRY_VN_WRITECNT) {
            vnode_pager_update_writecount(object,
                new_entry->start, new_entry->end);
        }

        /*
         * Insert the entry into the new map -- we know we're
         * inserting at the end of the new map.
         */
        vm_map_entry_link(new_map, new_map->header.prev,
            new_entry);
        vmspace_map_entry_forked(vm1, vm2, new_entry);

        /*
         * Update the physical map
         */
        pmap_copy(new_map->pmap, old_map->pmap, ------------------>页表复制
            new_entry->start,
            (old_entry->end - old_entry->start),
            old_entry->start);
        break;

    case VM_INHERIT_COPY:    ----------------->对应上面文字说明的是私有映射区域(如数据端和堆栈);
        /*
         * Clone the entry and link into the map.
         */
        new_entry = vm_map_entry_create(new
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值