The Kernel Boot Process

本文详细解析了从BIOS到操作系统完全启动的过程,包括实模式内核的加载、保护模式的进入、内核的解压缩及初始化等关键步骤,并对比了Linux与Windows的不同。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

http://duartes.org/gustavo/blog/post/kernel-boot-process


The previous post explained how computers boot up right up to the point where the boot loader, after stuffing the kernel image into memory, is about to jump into the kernel entry point. This last post about booting takes a look at the guts of the kernel to see how an operating system starts life. Since I have an empirical bent I’ll link heavily to the sources for Linux kernel 2.6.25.6 at the Linux Cross Reference. The sources are very readable if you are familiar with C-like syntax; even if you miss some details you can get the gist of what’s happening. The main obstacle is the lack of context around some of the code, such as when or why it runs or the underlying features of the machine. I hope to provide a bit of that context. Due to brevity (hah!) a lot of fun stuff – like interrupts and memory – gets only a nod for now. The post ends with the highlights for the Windows boot.

At this point in the Intel x86 boot story the processor is running in real-mode, is able to address 1 MB of memory, and RAM looks like this for a modern Linux system:

RAM contents after boot loader runs 
RAM contents after boot loader is done

The kernel image has been loaded to memory by the boot loader using the BIOS disk I/O services. This image is an exact copy of the file in your hard drive that contains the kernel, e.g./boot/vmlinuz-2.6.22-14-server. The image is split into two pieces: a small part containing the real-mode kernel code is loaded below the 640K barrier; the bulk of the kernel, which runs in protected mode, is loaded after the first megabyte of memory.

The action starts in the real-mode kernel header pictured above. This region of memory is used to implement the Linux boot protocol between the boot loader and the kernel. Some of the values there are read by the boot loader while doing its work. These include amenities such as a human-readable string containing the kernel version, but also crucial information like the size of the real-mode kernel piece. The boot loader also writes values to this region, such as the memory address for the command-line parameters given by the user in the boot menu. Once the boot loader is finished it has filled in all of the parameters required by the kernel header. It’s then time to jump into the kernel entry point. The diagram below shows the code sequence for the kernel initialization, along with source directories, files, and line numbers:

Architecture-specific Linux Kernel Initialization
Architecture-specific Linux Kernel Initialization

The early kernel start-up for the Intel architecture is in file arch/x86/boot/header.S. It’s in assembly language, which is rare for the kernel at large but common for boot code. The start of this file actually contains boot sector code, a left over from the days when Linux could work without a boot loader. Nowadays this boot sector, if executed, only prints a “bugger_off_msg” to the user and reboots. Modern boot loaders ignore this legacy code. After the boot sector code we have the first 15 bytes of the real-mode kernel header; these two pieces together add up to 512 bytes, the size of a typical disk sector on Intel hardware.

After these 512 bytes, at offset 0×200, we find the very first instruction that runs as part of the Linux kernel: the real-mode entry point. It’s in header.S:110 and it is a 2-byte jump written directly in machine code as 0x3aeb. You can verify this by running hexdump on your kernel image and seeing the bytes at that offset – just a sanity check to make sure it’s not all a dream. The boot loader jumps into this location when it is finished, which in turn jumps to header.S:229 where we have a regular assembly routine called start_of_setup. This short routine sets up a stack, zeroes the bss segment (the area that contains static variables, so they start with zero values) for the real-mode kernel and then jumps to good old C code at arch/x86/boot/main.c:122.

main() does some house keeping like detecting memory layout, setting a video mode, etc. It then calls go_to_protected_mode(). Before the CPU can be set to protected mode, however, a few tasks must be done. There are two main issues: interrupts and memory. In real-mode the interrupt vector table for the processor is always at memory address 0, whereas in protected mode the location of the interrupt vector table is stored in a CPU register called IDTR. Meanwhile, the translation of logical memory addresses (the ones programs manipulate) to linear memory addresses (a raw number from 0 to the top of the memory) is different between real-mode and protected mode. Protected mode requires a register called GDTR to be loaded with the address of a Global Descriptor Table for memory. So go_to_protected_mode() calls setup_idt() and setup_gdt() to install a temporary interrupt descriptor table and global descriptor table.

We’re now ready for the plunge into protected mode, which is done by protected_mode_jump, another assembly routine. This routine enables protected mode by setting the PE bit in the CR0 CPU register. At this point we’re running with paging disabled; paging is an optional feature of the processor, even in protected mode, and there’s no need for it yet. What’s important is that we’re no longer confined to the 640K barrier and can now address up to 4GB of RAM. The routine then calls the 32-bit kernel entry point, which is startup_32 for compressed kernels. This routine does some basic register initializations and calls decompress_kernel(), a C function to do the actual decompression.

decompress_kernel() prints the familiar “Decompressing Linux…” message. Decompression happens in-place and once it’s finished the uncompressed kernel image has overwritten the compressed one pictured in the first diagram. Hence the uncompressed contents also start at 1MB. decompress_kernel() then prints “done.” and the comforting “Booting the kernel.” By “Booting” it means a jump to the final entry point in this whole story, given to Linus by God himself atopMountain Halti, which is the protected-mode kernel entry point at the start of the second megabyte of RAM (0×100000). That sacred location contains a routine called, uh, startup_32. But this one is in a different directory, you see.

The second incarnation of startup_32 is also an assembly routine, but it contains 32-bit mode initializations. It clears the bss segment for the protected-mode kernel (which is the true kernel that will now run until the machine reboots or shuts down), sets up the final global descriptor table for memory, builds page tables so that paging can be turned on, enables paging, initializes a stack, creates the final interrupt descriptor table, and finally jumps to to the architecture-independent kernel start-up, start_kernel(). The diagram below shows the code flow for the last leg of the boot:

Architecture-independent Linux Kernel Initialization 
Architecture-independent Linux Kernel Initialization

start_kernel() looks more like typical kernel code, which is nearly all C and machine independent. The function is a long list of calls to initializations of the various kernel subsystems and data structures. These include the scheduler, memory zones, time keeping, and so on. start_kernel() then callsrest_init(), at which point things are almost all working. rest_init() creates a kernel thread passing another function, kernel_init(), as the entry point. rest_init() then calls schedule() to kickstart task scheduling and goes to sleep by calling cpu_idle(), which is the idle thread for the Linux kernel. cpu_idle() runs forever and so does process zero, which hosts it. Whenever there is work to do – a runnable process – process zero gets booted out of the CPU, only to return when no runnable processes are available.

But here’s the kicker for us. This idle loop is the end of the long thread we followed since boot, it’s the final descendent of the very first jump executed by the processor after power up. All of this mess, from reset vector to BIOS to MBR to boot loader to real-mode kernel to protected-mode kernel, all of it leads right here, jump by jump by jump it ends in the idle loop for the boot processor, cpu_idle(). Which is really kind of cool. However, this can’t be the whole story otherwise the computer would do no work.

At this point, the kernel thread started previously is ready to kick in, displacing process 0 and its idle thread. And so it does, at which point kernel_init() starts running since it was given as the thread entry point. kernel_init() is responsible for initializing the remaining CPUs in the system, which have been halted since boot. All of the code we’ve seen so far has been executed in a single CPU, called the boot processor. As the other CPUs, called application processors, are started they come up in real-mode and must run through several initializations as well. Many of the code paths are common, as you can see in the code for startup_32, but there are slight forks taken by the late-coming application processors. Finally, kernel_init() calls init_post(), which tries to execute a user-mode process in the following order: /sbin/init, /etc/init, /bin/init, and /bin/sh. If all fail, the kernel will panic. Luckily init is usually there, and starts running as PID 1. It checks its configuration file to figure out which processes to launch, which might include X11 Windows, programs for logging in on the console, network daemons, and so on. Thus ends the boot process as yet another Linux box starts running somewhere. May your uptime be long and untroubled.

The process for Windows is similar in many ways, given the common architecture. Many of the same problems are faced and similar initializations must be done. When it comes to boot one of the biggest differences is that Windows packs all of the real-mode kernel code, and some of the initial protected mode code, into the boot loader itself (C:\NTLDR). So instead of having two regions in the same kernel image, Windows uses different binary images. Plus Linux completely separates boot loader and kernel; in a way this automatically falls out of the open source process. The diagram below shows the main bits for the Windows kernel:

Windows Kernel Initialization
Windows Kernel Initialization

The Windows user-mode start-up is naturally very different. There’s no /sbin/init, but rather Csrss.exe and Winlogon.exe. Winlogon spawns Services.exe, which starts all of the Windows Services, and Lsass.exe, the local security authentication subsystem. The classic Windows login dialog runs in the context of Winlogon.

This is the end of this boot series. Thanks everyone for reading and for feedback. I’m sorry some things got superficial treatment; I’ve gotta start somewhere and only so much fits into blog-sized bites. But nothing like a day after the next; my plan is to do regular “Software Illustrated” posts like this series along with other topics. Meanwhile, here are some resources:

  • The best, most important resource, is source code for real kernels, either Linux or one of the BSDs.
  • Intel publishes excellent Software Developer’s Manuals, which you can download for free.
  • Understanding the Linux Kernel is a good book and walks through a lot of the Linux Kernel sources. It’s getting outdated and it’s dry, but I’d still recommend it to anyone who wants to grok the kernel. Linux Device Drivers is more fun, teaches well, but is limited in scope. Finally, Patrick Moroney suggested Linux Kernel Development by Robert Love in the comments for this post. I’ve heard other positive reviews for that book, so it sounds worth checking out.
  • For Windows, the best reference by far is Windows Internals by David Solomon and Mark Russinovich, the latter of Sysinternals fame. This is a great book, well-written and thorough. The main downside is the lack of source code.

[Update: In a comment below, Nix covered a lot of ground on the initial root file system that I glossed over. Thanks to Marius Barbu for catching a mistake where I wrote "CR3" instead of GDTR]

<think>我们正在处理一个系统警告:检测到内核版本低于推荐的最小版本(4.19.90 < 5.5.0),这可能导致进程挂起。用户希望解决这个问题,并可能寻求升级内核的方法。 根据引用[1][2][3][4]中的信息,这个问题在多个场景下出现,尤其是在使用多GPU训练时(如transformers的Trainer进行分布式训练)可能导致进程挂起。虽然可以通过限制使用单张GPU(设置CUDA_VISIBLE_DEVICES)临时规避,但根本解决方案是升级内核。 解决方案: 1. 临时解决方案:指定使用单张GPU(如果多GPU不是必须的) 2. 根本解决方案:升级Linux内核到5.5.0或更高版本 下面将详细说明这两种方案,并提供升级内核的步骤(以Ubuntu为例,因为引用中提到了Ubuntu系统)。 注意:升级内核是一项系统级操作,需要管理员权限,并且存在一定风险。建议在操作前备份重要数据。 详细步骤: 一、临时解决方案(不需要升级内核) 在程序运行前,通过环境变量指定仅使用一块GPU(例如使用第一块GPU): ```bash # 在运行训练命令前设置环境变量 export CUDA_VISIBLE_DEVICES=0 ``` 或者在Python代码中,在导入torch等库之前设置: ```python import os os.environ["CUDA_VISIBLE_DEVICES"] = "0" # 必须在导入torch之前设置 # 然后继续导入其他库 import torch from transformers import Trainer, ... ``` 二、升级Linux内核(以Ubuntu 18.04/20.04为例) 步骤1:检查当前内核版本 ```bash uname -r ``` 步骤2:更新系统并安装必要的软件包 ```bash sudo apt update sudo apt upgrade sudo apt install build-essential libncurses-dev bison flex libssl-dev libelf-dev ``` 步骤3:下载新版本内核(以5.5.0为例,但建议选择更稳定的新版本,如5.15.x) 可以在官网(https://www.kernel.org/)查找最新稳定版本,或者使用以下命令下载指定版本: ```bash # 以5.15.0版本为例 wget https://cdn.kernel.org/pub/linux/kernel/v5.x/linux-5.15.0.tar.xz ``` 步骤4:解压内核源码 ```bash tar -xvf linux-5.15.0.tar.xz cd linux-5.15.0 ``` 步骤5:配置内核(可以使用当前内核的配置作为基础) ```bash sudo cp /boot/config-$(uname -r) .config sudo make olddefconfig ``` 步骤6:编译内核(可能需要较长时间) ```bash sudo make -j$(nproc) # nproc为CPU核心数 ``` 步骤7:安装内核模块和内核 ```bash sudo make modules_install sudo make install ``` 步骤8:更新GRUB引导 ```bash sudo update-initramfs -c -k 5.15.0 sudo update-grub ``` 步骤9:重启系统并验证 ```bash sudo reboot uname -r # 重启后检查内核版本 ``` 注意:如果升级内核后遇到问题(如无法启动),可以在GRUB菜单中选择旧内核启动。 三、其他注意事项 1. 如果使用的是云服务器,部分云平台提供的内核升级选项可能更简单(如阿里云提供的内核升级功能)。 2. 如果不想编译内核,可以尝试添加官方内核PPA(适用于Ubuntu): ```bash # 添加PPA(以安装5.15内核为例) sudo add-apt-repository ppa:canonical-kernel-team/ppa sudo apt update sudo apt install linux-generic-5.15 sudo reboot ``` 3. 引用[4]中提到,在Ubuntu 18.04上,内核版本为5.4.0,通过指定单卡可以临时解决。但长期看,升级内核是更稳妥的方案。 四、验证 升级内核后,重新运行之前出现警告的程序,检查是否还有内核版本警告以及进程挂起问题是否解决。 五、如果升级内核后仍遇到问题 可以考虑检查CUDA和驱动版本是否与新内核兼容,必要时升级NVIDIA驱动。 相关问题: §§相关问题§§ 1. 如何在不升级内核的情况下避免多GPU训练时的进程挂起问题? 2. 升级Linux内核后出现兼容性问题如何解决? 3. 如何检查CUDA驱动与Linux内核版本的兼容性? 4. 在云服务器上升级内核有哪些注意事项?</think>### 解决内核版本过低导致进程挂起的问题 检测到您的内核版本为 4.19.90,低于推荐的最小版本 5.5.0,这可能导致分布式训练进程挂起[^1][^2][^3][^4]。以下是完整的解决方案: --- ### 一、临时解决方案(无需升级内核) 1. **强制使用单GPU模式** 在 Python 导入任何深度学习库前设置环境变量: ```python import os os.environ["CUDA_VISIBLE_DEVICES"] = "0" # 必须放在所有import之前 # 然后导入其他库 import torch from transformers import Trainer ``` 此方法通过禁用多GPU并行避免内核问题[^4]。 2. **降低深度学习库版本** ```bash pip install torch==1.10.0+cu113 torchvision==0.11.1+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html pip install transformers==4.20.0 ``` 较旧版本对低内核兼容性更好(需测试兼容性)。 --- ### 二、永久解决方案(升级内核) #### Ubuntu 系统升级步骤: ```bash # 1. 添加官方内核PPA sudo add-apt-repository ppa:cappelikan/ppa sudo apt update # 2. 安装最新LTS内核(当前推荐6.5+) sudo apt install mainline # 3. 查看可用内核版本 mainline -l # 4. 安装特定版本(示例) sudo mainline --install-latest # 或指定版本 sudo mainline --install 6.5.0 # 5. 更新GRUB引导 sudo update-grub # 6. 重启系统 sudo reboot ``` #### 重启后验证: ```bash uname -r # 应显示 >=5.5.0 的版本 ``` #### CentOS/RHEL 升级步骤: ```bash # 1. 启用ELRepo sudo rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org sudo rpm -Uvh https://www.elrepo.org/elrepo-release-8.el8.elrepo.noarch.rpm # 2. 安装最新内核 sudo yum --enablerepo=elrepo-kernel install kernel-ml # 3. 设置新内核为默认 sudo grub2-set-default 0 # 4. 更新GRUB sudo grub2-mkconfig -o /boot/grub2/grub.cfg # 5. 重启 sudo reboot ``` --- ### 三、升级后的关键检查 1. **NVIDIA驱动兼容性** ```bash nvidia-smi # 检查驱动是否正常工作 nvcc --version # 验证CUDA版本 ``` 若驱动失效,需重装: ```bash sudo apt install --reinstall nvidia-driver-535 # 版本根据GPU调整 ``` 2. **深度学习环境验证** ```python import torch print(torch.__version__) # 应 >=1.10.0 print(torch.cuda.is_available()) # 应为True ``` --- ### 四、技术原理 低版本内核(<5.5)存在以下问题: 1. **异步I/O缺陷**:导致DDP通信死锁 $$ \lim_{T \to \infty} \int_0^T f(t)dt \text{ 可能不收敛} $$ 2. **内存管理漏洞**:GPU-CPU数据传输异常 3. **NUMA调度缺陷**:多GPU负载不均衡 升级内核后修复了这些底层机制[^2][^4]。 > **注意**:云服务器用户需在控制台选择支持高内核的镜像(如AWS的Amazon Linux 2023,GCP的Container Optimized OS) --- ### 五、验证解决方案 1. 运行分布式训练命令 2. 检查日志中是否仍有内核警告 3. 使用`htop`监控进程状态 ```bash watch -n 1 "ps aux | grep python" # 实时监控进程 ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值