OpenSolaris系统调用在x86系统上的实现

本文介绍了在Solaris x86和x64平台上不同的系统调用方法及其工作原理,包括传统调用方式lcall$0x27、新的int$0x91以及针对Intel和AMD处理器优化的sysenter和syscall指令。通过实际例子跟踪系统调用过程,展示了从用户空间到内核空间的转换。
x86 syscall primer

Getting started on a project as complex as an operating system can be quite a daunting task. To help OpenSolaris newcomers sort out their head from their tail, here's a look at the system call infrastructure on Solaris x86 and Solaris x64.

I'll go over the different system call methods used, their departure points in userland and entry points in the kernel, and then we'll actually follow one into the kernel with the debugger to see it all in action.

Background

Processors in the x86 world support a number of different system call methods, and some are faster than others. In Solaris, unoptimized system calls take one of three possible paths into the kernel:

lcall $0x27
Used for years as the standard Solaris syscall method.
int $0x91
Used by linux for years (vector 0x80), Solaris finally adopted int as the base syscall method in Solaris 11 (under development) - and earned a significant performance increase as a result. It will be available soon in a Solaris 10 update.
lcall $0x7
Used by some ( very old) statically linked binaries.

Fast Syscalls and Hardware Capability Libraries

When a well-behaved application makes a system call, it jumps through a wrapper function in libc. Changing the instruction used to enter the kernel becomes a matter of changing the wrappers in libc. Recently I integrated support for faster, chip-proprietary system calls into Solaris 10: sysenter (from Intel) and syscall (from AMD). Along with new kernel entry points, new hwcap (as in "hardware capability") versions of libc were provided to take advantage of the these new, faster instructions ( Tim Marsland has written about the hw capability architecture and Darren Moffat has written about how the system goes about selecting and using a hwcap libc).

I often get confused about which system call method is used on which type of system. For the record, the following table shows which methods are supported by the various flavor combinations of x86 kernels, CPUs, and user application types shipping today:

u64 = 64-bit user applications        u32 = 32-bit user applications

  syscallsysenter
64-bit kernelIntel Xeonu64 (64-bit libc)u32 (hwcap1)
AMD Opteronu64 (64-bit libc)
u32 (hwcap2)
-
32-bit kernelIntel Xeon-u32 (hwcap1)
AMD Opteron-u32 (hwcap1)
(The hwcap libraries referenced live in the /usr/lib/libc directory.)

Note that we only support AMD's syscall instruction in the 64-bit kernel. Using syscall/sysret in the 32-bit kernel is too complicated and not worth the trouble.

Digging In

To illustrate this, let's take a look at the libc source code. It lives in under the usr/src/lib/libc directory. The important entries here are:

  • i386/ - 32-bit source code and unoptimized binary
  • amd64/ - 64-bit source code and binary
  • i386_hwcap1/ - Intel CPU-specific source code and binary
  • i386_hwcap2/ - AMD CPU-specific source code and binary

A simple system call to use for this example is mkdir(2). We can use mdb to disassemble the text bits and see how libc jumps into the kernel:

rab> mdb /lib/libc.so.1
Loading modules: [ libc.so.1 ]
> mkdir::dis
mkdir: movl $0x50,%eax
mkdir+5: syscall
mkdir+7: jb -0x82847 <__cerror>
mkdir+0xd: ret

We can see that the system call number (See Eric Schrock's post for more information on system call numbers) is stashed away in register %eax so the kernel can find it later, and then the syscall instruction is executed to transfer control to the kernel.

This example is on an AMD Opteron system, because otherwise we'd expect to find either lcall $0x27 or sysenter as the control transfer instruction. We can get at the unoptimized libc by unmounting the hwcap library:

rab> su
Password:
# umount /lib/libc.so.1
rab> mdb /lib/libc.so.1
Loading modules: [ libc.so.1 ]
> mkdir::dis
mkdir: movl $0x50,%eax
mkdir+5: lcall $0x27,$0x0
mkdir+0xc: jb -0x82b2c <__cerror>
mkdir+0x12: ret

Tracing it back to the source

Ah-hah - now let's look at the source for the libc mkdir(2) wrapper to complete the userland picture:

rab> pwd
.../usr/src/lib/libc/common/sys
rab> cat mkdir.s
[ snip ]
#include "SYS.h"

SYSCALL_RVAL1(mkdir)
RET
SET_SIZE(mkdir)

In order to organize the source in a portable way that avoids reproducing the same code in more than one place, many portions of libc are implemented as preprocessor macros. mkdir(2) is so simple that it needs nothing but the SYSCALL macro, found in SYS.h. For reasons too boring to repeat here, the SYSCALL macro eventually expands into a corresponding SYSTRAP macro. All 32-bit variants of libc share one SYS.h, and preprocessor macros defined via Makefiles in the binary directories determine which instructions go into the SYSTRAP macro:

rab> pwd
.../usr/src/lib/libc/i386/inc
rab> grep SYSTRAP_RVAL1 SYS.h
#define SYSTRAP_RVAL1(name) __SYSCALL(name)
#define SYSTRAP_RVAL1(name) __SYSENTER(name)
#define SYSTRAP_RVAL1(name) __SYSLCALL(name)

One of the above macros are used depending on which libc is being built: __SYSCALL() for hwcap2, __SYSENTER() for hwcap1, and __SYSLCALL() for the unoptimized base libc at /lib/libc.so.1.

rab> cat SYS.h
[ snip ]
#define __SYSLCALL(name) /
/* CSTYLED */ /
movl $SYS_/**/name, %eax; /
lcall $SYSCALL_TRAPNUM, $0
[ snip ]
#define __SYSCALL(name) /
/* CSTYLED */ /
movl $SYS_/**/name, %eax; /
.byte 0xf, 0x5 /* syscall */

We added support for AMD's syscall instruction to Solaris, but we were using a slightly older version of our assembler which (embarassingly enough) didn't yet recognize the instruction, so its opcode had to be manually hard-coded into libc.

Jumping Over the Fence

That's all for userland; the easy part is over. Because the actual workings of the differing system call instructions vary widely, the kernel uses separate code paths to deal with each. The function entry points used are (shown are only those for 32-bit applications making system calls):

 Entry InstructionKernel Entry Point
64-bit kernellcall* trap()
syscall sys_syscall32()
sysenter sys_sysenter()
32-bit kernellcall sys_call()
sysenter sys_sysenter()

*           In the 64-bit kernel, 32-bit system calls made via lcall come in to the system via a segment-not-present trap (#np), a matter which is beyond the scope of this document. Trust me, you don't want to get into segmentation now...

Seeing it in Action

Using the kernel debugger we can step out of the classroom and watch these creatures in their native wild habitats. Boot a machine and from the system console get the kernel debugger loaded and ready. Enter the debugger, and then set a breakpoint on the syscall entry point. I'm still using the same Opteron machine as above (running the 64-bit kernel), so I need to re-mount the hwcap library:

root> mount -O -F lofs /usr/lib/libc/libc_hwcap2.so.1 /lib/libc.so.1
root> mdb -K

Welcome to kmdb
Loaded modules: [ cpc ptm ufs unix krtld sppp nca lofs genunix ip logindmux usba
specfs nfs random sctp ]
[0]> sys_syscall32:b
[0]> :c
kmdb: stop at sys_syscall32
kmdb: target stopped at:
sys_syscall32: swapgs
[1]> ::cpuinfo
ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC
0 fffffffffbc230a0 1b 0 0 60 no no t-0 ffffffff82b38520
fsflush
1 ffffffff8bdd1800 1b 0 0 49 no no t-0 ffffffff8cc991e0 ksh

We set a breakpoint, and tripped over it immediately after continuing (because system calls are a very common occurrence on even an idle machine). We can see that CPU1 tripped the breakpoint first (as evidenced by the [1] in the kmdb prompt), and that ksh is the process running. Which system call is the shell making? Remember that the libc wrapper function stashed the system call number in register %eax. When we are in the 64-bit kernel, %eax is the lower 32-bits of register %rax:

[1]> <rax=D
98

syscall 98, which -- according to the sysent table (see sysent.c) -- is the shell doing a sigaction(2) (which makes sense, because shells are always messing around with signals).

Clear the breakpoint and try the same thing with the 64-bit entry point (it is sys_syscall()), but this time enter the debugger by sending a break over the console (how one does this varies depending on the terminal being used to access the console):

[1]> :z
[1]> sys_syscall:b
[1]> :c
root>
root>
root>

Because this is an otherwise idle machine, nothing trips the 64-bit syscall breakpoint just yet. There just aren't very many 64-bit processes running. We can run one manually to trigger the breakpoint:

root> /usr/bin/amd64/ls 
kmdb: stop at sys_syscall
kmdb: target stopped at:
sys_syscall: swapgs
[1]> <rax=D
115

We see that the first 64-bit system call made by the 64-bit ls is mmap(2), which makes sense because the 64-bit dynamic linker needs to begin setting up the new process's address space.

 
采用PyQt5框架与Python编程语言构建图书信息管理平台 本项目基于Python编程环境,结合PyQt5图形界面开发库,设计实现了一套完整的图书信息管理解决方案。该系统主要面向图书馆、书店等机构的日常运营需求,通过模块化设计实现了图书信息的标准化管理流程。 系统架构采用典型的三层设计模式,包含数据存储层、业务逻辑层和用户界面层。数据持久化方案支持SQLite轻量级数据库与MySQL企业级数据库的双重配置选项,通过统一的数据库操作接口实现数据存取隔离。在数据建模方面,设计了包含图书基本信息、读者档案、借阅记录等核心数据实体,各实体间通过主外键约束建立关联关系。 核心功能模块包含六大子系统: 1. 图书编目管理:支持国际标准书号、中国图书馆分类法等专业元数据的规范化著录,提供批量导入与单条录入两种数据采集方式 2. 库存动态监控:实时追踪在架数量、借出状态、预约队列等流通指标,设置库存预警阈值自动提醒补货 3. 读者服务管理:建立完整的读者信用评价体系,记录借阅历史与违规行为,实施差异化借阅权限管理 4. 流通业务处理:涵盖借书登记、归还处理、续借申请、逾期计算等标准业务流程,支持射频识别技术设备集成 5. 统计报表生成:按日/月/年周期自动生成流通统计、热门图书排行、读者活跃度等多维度分析图表 6. 系统维护配置:提供用户权限分级管理、数据备份恢复、操作日志审计等管理功能 在技术实现层面,界面设计遵循Material Design设计规范,采用QSS样式表实现视觉定制化。通过信号槽机制实现前后端数据双向绑定,运用多线程处理技术保障界面响应流畅度。数据验证机制包含前端格式校验与后端业务规则双重保障,关键操作均设有二次确认流程。 该系统适用于中小型图书管理场景,通过可扩展的插件架构支持功能模块的灵活组合。开发过程中特别注重代码的可维护性,采用面向对象编程范式实现高内聚低耦合的组件设计,为后续功能迭代奠定技术基础。 资源来源于网络分享,仅用于学习交流使用,请勿用于商业,如有侵权请联系我删除!
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值