第三章:字符设备驱动-4:Some Important Data Structures

In continuation of the previous text:第三章:字符设备驱动-2:The Internal Representation of Device Numbers, let's GO ahead.

Some Important Data Structures

As you can imagine, device number registration is just the first of many tasks that driver code must carry out. We will soon look at other important driver components, but one other digression is needed first. Most of the fundamental driver operations involve three important kernel data structures, called file_operations, file, and inode. A basic familiarity with these structures is required to be able to do much of anything interesting, so we will now take a quick look at each of them before getting into the details of how to implement the fundamental driver operations.

正如你所想,设备号注册只是驱动代码必须完成的众多任务中的第一步。我们很快会介绍其他重要的驱动组件,但在此之前,还需要先补充一个知识点:大多数基础的驱动操作都围绕三个重要的内核数据结构展开,它们分别是 file_operationsfile 和 inode

要实现任何有实际意义的驱动功能,都需要对这三个结构有基本了解。因此,在深入讲解如何实现基础驱动操作之前,我们先快速了解它们各自的作用。

补充说明:

  1. 三大结构的核心定位
    这三个结构分别对应 “驱动能力”“用户会话” 和 “文件元信息”,共同构成用户态与驱动交互的桥梁:

    • file_operations:驱动的 “能力清单”—— 定义驱动支持的操作(如 readwriteopen),是驱动向内核暴露功能的接口;

    • inode:设备文件的 “元信息载体”—— 存储设备文件的静态属性(如设备号、文件权限),每个设备文件对应一个 inode

    • file:用户与设备的 “交互会话”—— 代表用户态程序打开设备文件后的一个 “连接”,记录会话相关的动态信息(如当前读写位置、是否非阻塞模式)。

File Operations

So far, we have reserved some device numbers for our use, but we have not yet connected any of our driver’s operations to those numbers. The file_operations structure is how a char driver sets up this connection. The structure, defined in <linux/fs.h>, is a collection of function pointers. Each open file (represented internally by a file structure, which we will examine shortly) is associated with its own set of functions (by including a field called f_op that points to a file_operations structure). The operations are mostly in charge of implementing the system calls and are therefore, named open, read, and so on. We can consider the file to be an “object” and the functions operating on it to be its “methods,” using object-oriented programming terminology to denote actions declared by an object to act on itself. This is the first sign of object-oriented programming we see in the Linux kernel, and we’ll see more in later chapters.

到目前为止,我们已经为驱动预留了一些设备号,但尚未将驱动的任何操作与这些设备号关联起来。file_operations 结构正是字符驱动建立这种关联的核心机制。

该结构定义于 <linux/fs.h> 中,是一个函数指针集合。每个已打开的文件(内核内部用 file 结构表示,我们即将介绍)都通过其 f_op 字段与一组操作函数关联(f_op 字段指向一个 file_operations 结构)。这些操作函数主要用于实现系统调用,因此命名为 openread 等。

用面向对象编程的术语来说,我们可以将 file 视为 “对象”,而作用于它的函数则是 “方法”—— 即对象声明的用于操作自身的动作。这是我们在 Linux 内核中看到的面向对象编程思想的第一个体现,后续章节还会遇到更多。

Conventionally, a file_operations structure or a pointer to one is called fops (or some variation thereof). Each field in the structure must point to the function in the driver that implements a specific operation, or be left NULL for unsupported operations. The exact behavior of the kernel when a NULL pointer is specified is different for each function, as the list later in this section shows.

按照惯例,file_operations 结构或其指针通常被称为 fops(或类似变体)。结构中的每个字段必须指向驱动中实现特定操作的函数,对于不支持的操作,则可设为 NULL。内核对 NULL 指针的处理行为因函数而异,本节后面的列表会说明这一点。

The following list introduces all the operations that an application can invoke on a device. We’ve tried to keep the list brief so it can be used as a reference, merely summarizing each operation and the default kernel behavior when a NULL pointer is used.

以下列表介绍了应用程序可对设备调用的所有操作。我们尽量保持简洁,使其可作为参考,仅总结每个操作的作用以及使用 NULL 指针时内核的默认行为。

As you read through the list of file_operations methods, you will note that a number of parameters include the string __user. This annotation is a form of documentation, noting that a pointer is a user-space address that cannot be directly dereferenced. For normal compilation, __user has no effect, but it can be used by external checking software to find misuse of user-space addresses.

阅读 file_operations 方法列表时,你会注意到许多参数包含 __user 前缀。这个注解是一种文档形式,表明该指针是用户空间地址,不能直接解引用。在常规编译中,__user 没有实际作用,但外部检查工具可通过它发现用户空间地址的误用。

The rest of the chapter, after describing some other important data structures, explains the role of the most important operations and offers hints, caveats, and real code examples. We defer discussion of the more complex operations to later chapters, because we aren’t ready to dig into topics such as memory management, blocking operations, and asynchronous notification quite yet.

本章在介绍其他重要数据结构后,将详细解释最关键的操作的作用,并提供提示、注意事项和实际代码示例。对于更复杂的操作(如内存管理、阻塞操作和异步通知等),我们会推迟到后续章节讨论,因为目前还未准备好深入这些主题。

Field NamePrototypeDescriptionBehavior when NULL
ownerstruct module *Not an operation; pointer to the module that "owns" the structure. Prevents the module from being unloaded while its operations are in use. Almost always initialized to THIS_MODULE.N/A (should always be set to THIS_MODULE)
llseekloff_t (*)(struct file *, loff_t, int)Used to change the current read/write position in a file. Returns new position as positive value; errors as negative. loff_t is at least 64 bits wide.Seek calls modify position counter in file structure in potentially unpredictable ways
readssize_t (*)(struct file *, char __user *, size_t, loff_t *)Retrieves data from the device. Nonnegative return value = bytes successfully read.read system call fails with -EINVAL
aio_readssize_t (*)(struct kiocb *, char __user *, size_t, loff_t)Initiates asynchronous read (may not complete before return).Operations processed synchronously by read instead
writessize_t (*)(struct file *, const char __user *, size_t, loff_t *)Sends data to the device. Nonnegative return value = bytes successfully written.write system call fails with -EINVAL
aio_writessize_t (*)(struct kiocb *, const char __user *, size_t, loff_t)Initiates asynchronous write operation on the device.Operations processed synchronously by write instead
readdirint (*)(struct file *, void *, filldir_t)Used for reading directories; useful only for filesystems.Should be NULL for device files
pollunsigned int (*)(struct file *, struct poll_table_struct *)Back end for poll, epoll, select system calls. Returns bit mask indicating if non-blocking reads/writes are possible.Device assumed to be both readable and writable without blocking
ioctlint (*)(struct inode *, struct file *, unsigned int, unsigned long)Implements device-specific commands. Some commands recognized by kernel without fops reference.Returns -ENOTTY for non-predefined requests
mmapint (*)(struct file *, struct vm_area_struct *)Requests mapping of device memory to process address space.mmap system call returns -ENODEV
openint (*)(struct inode *, struct file *)First operation performed on device file; handles initialization.Opening always succeeds but driver isn't notified
flushint (*)(struct file *)Invoked when process closes file descriptor; executes/waits for outstanding operations. Not to be confused with fsync.Kernel ignores the request
releaseint (*)(struct inode *, struct file *)Invoked when file structure is released; handles cleanup.No operation performed
fsyncint (*)(struct file *, struct dentry *, int)Back end for fsync system call (flushes pending data).System call returns -EINVAL
aio_fsyncint (*)(struct kiocb *, int)Asynchronous version of fsync method.N/A
fasyncint (*)(int, struct file *, int)Notifies device of change in FASYNC flag. Related to asynchronous notification (see Chapter 6).Driver doesn't support asynchronous notification
lockint (*)(struct file *, int, struct file_lock *)Implements file locking; essential for regular files but rarely used by device drivers.No file locking implemented
readvssize_t (*)(struct file *, const struct iovec *, unsigned long, loff_t *)Implements scatter/gather read operations.read method is called instead (possibly multiple times)
writevssize_t (*)(struct file *, const struct iovec *, unsigned long, loff_t *)Implements scatter/gather write operations.write method is called instead (possibly multiple times)
sendfilessize_t (*)(struct file *, loff_t *, size_t, read_actor_t, void *)Implements read side of sendfile system call (efficient data transfer between file descriptors).Usually left NULL by device drivers
sendpagessize_t (*)(struct file *, struct page *, int, size_t, loff_t *, int)Other half of sendfile; called by kernel to send data one page at a time.Rarely implemented by device drivers
get_unmapped_areaunsigned long (*)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long)Finds suitable location in process address space for memory mapping, enforcing device alignment requirements.Memory management code handles it by default
check_flagsint (*)(int)Allows module to check flags passed to fcntl(F_SETFL...) call.No flag checking performed
dir_notifyint (*)(struct file *, unsigned long)Invoked when application uses fcntl to request directory change notifications. Useful only for filesystems.Not needed for device drivers; should be NULL
字段名类型 / 函数原型核心作用NULL 指针默认行为
ownerstruct module *指向 “拥有” 该 file_operations 结构的模块,防止模块在操作执行时被卸载。无(必须显式初始化,通常设为 THIS_MODULE,否则可能导致模块管理异常)
llseekloff_t (*)(struct file *, loff_t, int)修改文件的当前读写位置,返回新位置(正数);错误时返回负数。内核会修改 file 结构中的位置计数器,但行为可能不可预测(不推荐留空)
readssize_t (*)(struct file *, char __user *, size_t, loff_t *)从设备读取数据到用户空间,返回成功读取的字节数(非负)。read 系统调用失败,返回 -EINVAL(无效参数)
aio_readssize_t (*)(struct kiocb *, char __user *, size_t, loff_t)发起异步读取操作(函数返回时操作可能未完成)。所有读取操作通过 read 方法同步处理
writessize_t (*)(struct file *, const char __user *, size_t, loff_t *)将用户空间数据写入设备,返回成功写入的字节数(非负)。write 系统调用失败,返回 -EINVAL(无效参数)
aio_writessize_t (*)(struct kiocb *, const char __user *, size_t, loff_t)发起异步写入操作(函数返回时操作可能未完成)。所有写入操作通过 write 方法同步处理
readdirint (*)(struct file *, void *, filldir_t)读取目录内容,仅用于文件系统,设备文件无需实现。设备文件中必须设为 NULL(无默认行为,留空不影响)
pollunsigned int (*)(struct file *, struct poll_table_struct *)实现 poll/epoll/select 系统调用的后端,返回表示读写是否非阻塞的位掩码。内核默认认为设备可无阻塞读写(可能导致用户程序误判)
ioctlint (*)(struct inode *, struct file *, unsigned int, unsigned long)处理设备专属控制命令(如格式化软盘磁道),非读写操作。对非内核预定义的命令,返回 -ENOTTY(设备无此 ioctl 命令)
mmapint (*)(struct file *, struct vm_area_struct *)将设备内存映射到进程地址空间。mmap 系统调用失败,返回 -ENODEV(无此设备)
openint (*)(struct inode *, struct file *)打开设备文件时调用,用于初始化资源(如分配设备实例、初始化硬件)。设备打开操作始终成功,但驱动不会收到通知(无法初始化资源)
flushint (*)(struct file *)进程关闭设备文件描述符时调用,等待所有未完成的设备操作执行完毕。内核忽略该请求(不影响基础功能,仅部分特殊驱动需实现,如 SCSI 磁带驱动)
releaseint (*)(struct inode *, struct file *)file 结构被释放时调用,用于释放 open 阶段分配的资源。无操作(可能导致资源泄漏,推荐显式实现以释放资源)
fsyncint (*)(struct file *, struct dentry *, int)实现 fsync 系统调用的后端,强制刷新设备上的待写数据。fsync 系统调用失败,返回 -EINVAL(无效参数)
aio_fsyncint (*)(struct kiocb *, int)fsync 的异步版本,发起异步数据刷新操作。无默认行为(需依赖同步 fsync,若 fsync 也为 NULL,则操作失败)
fasyncint (*)(int, struct file *, int)处理设备 FASYNC 标志的变更,支持异步通知功能(进阶主题,见第 6 章)。驱动不支持异步通知(对无需异步通知的驱动无影响)
lockint (*)(struct file *, int, struct file_lock *)实现文件锁定功能,常规文件必需,但设备驱动几乎无需实现。无文件锁定功能(设备驱动通常无需处理文件锁定,留空即可)
readvssize_t (*)(struct file *, const struct iovec *, unsigned long, loff_t *)实现分散 / 聚集读取,一次性从多个内存区域读取数据。内核通过多次调用 read 方法完成操作(可能增加开销,但功能可用)
writevssize_t (*)(struct file *, const struct iovec *, unsigned long, loff_t *)实现分散 / 聚集写入,一次性将数据写入多个内存区域。内核通过多次调用 write 方法完成操作(可能增加开销,但功能可用)
sendfilessize_t (*)(struct file *, loff_t *, size_t, read_actor_t, void *)实现 sendfile 系统调用的读取端,用于高效在文件描述符间传输数据(如网页服务器)。内核不使用该优化路径(设备驱动通常无需实现,留空即可)
sendpagessize_t (*)(struct file *, struct page *, int, size_t, loff_t *, int)实现 sendfile 的写入端,内核通过它逐页向设备发送数据。无默认行为(设备驱动通常无需实现,留空即可)
get_unmapped_areaunsigned long (*)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long)为设备内存映射查找进程地址空间中的合适位置,用于满足设备的内存对齐要求。内核使用默认内存管理逻辑分配地址(大多数驱动无需特殊对齐,留空即可)
check_flagsint (*)(int)检查 fcntl(F_SETFL...) 调用传递的标志是否合法。内核不检查标志合法性(直接接受所有标志,可能导致异常)
dir_notifyint (*)(struct file *, unsigned long)处理应用程序通过 fcntl 请求的目录变更通知,仅用于文件系统。设备驱动无需实现,设为 NULL 即可(无影响)

补充说明:

  1. 核心字段与非核心字段的区分

    驱动开发中需优先实现 “核心字段” 以保证基础功能可用,非核心字段可根据需求选择性实现:

    1. 非核心字段:aio_read/aio_write(异步 I/O)、fasync(异步通知)、mmap(内存映射)等,仅在特定场景下需要。

    2. 核心字段:owner(必设)、open(资源初始化)、read/write(数据传输)、release(资源释放);

  2. __user 注解的使用规范

    所有带 __user 修饰的指针(如 char __user *)均指向用户空间地址,驱动必须使用 copy_from_user(用户→内核)、copy_to_user(内核→用户)等函数传输数据,禁止直接解引用(可能导致内核崩溃或安全问题)。

  3. 返回值类型的统一规则

    • ssize_t:表示 “有符号的大小”,用于 read/write 等函数,正数为成功字节数,负数为错误码;

    • loff_t:表示 “长偏移量”,至少 64 位,用于 llseek 函数返回新的读写位置;

    • unsigned int:用于 poll 函数返回位掩码,标识读写状态;

    • int:多数控制类函数(如 open/ioctl)的返回值,0 为成功,负数为错误码。

The scull device driver implements only the most important device methods. Its file_operations structure is initialized as follows:

scull 设备驱动仅实现了最重要的设备方法,其 file_operations 结构的初始化如下:

struct file_operations scull_fops = {
.owner = THIS_MODULE,
.llseek = scull_llseek,
.read = scull_read,
.write = scull_write,
.ioctl = scull_ioctl,
.open = scull_open,
.release = scull_release,
};

This declaration uses the standard C tagged structure initialization syntax. This syntax is preferred because it makes drivers more portable across changes in the definitions of the structures and, arguably, makes the code more compact and readable. Tagged initialization allows the reordering of structure members; in some cass, substantial performance improvements have been realized by placing pointers to frequently accessed members in the same hardware cache line.

该声明使用了标准 C 语言的标记化结构初始化语法。这种语法是推荐的写法,原因如下:

  1. 跨版本兼容性:即使内核修改了 file_operations 结构的成员定义(如增减字段、调整顺序),使用标记化初始化的驱动也能保持兼容;

  2. 代码简洁性:无需按结构定义的顺序初始化所有成员,只需显式初始化驱动支持的方法,未提及的成员会自动设为 NULL

  3. 可读性:通过 .成员名 = 函数名 的形式,能清晰看出每个操作与驱动函数的绑定关系;

  4. 性能优化潜力:在某些情况下,通过调整频繁访问的成员顺序(如将 readwrite 等常用方法放在相邻位置),可让它们处于同一硬件缓存行中,提升访问效率。

补充说明:

  1. 标记化初始化的核心优势

    以 file_operations 结构为例,内核版本升级时可能会新增成员(如新增异步 I/O 相关方法)。若使用传统的顺序初始化(按成员声明顺序赋值),新增成员会导致初始化列表与结构定义不匹配,驱动编译失败;而标记化初始化仅关注显式指定的成员,对新增成员自动适配(设为 NULL),大幅降低维护成本。

  2. scull 选择实现的核心方法

    ​​​​​​​scull 驱动仅实现了最基础的操作,覆盖了字符设备的核心功能:

    1. open/release:负责设备打开时的初始化(如分配内存)和关闭时的清理(如释放资源)。

    2. ioctl:提供设备控制接口(如调整内存区域大小);

    3. read/write:实现数据的读写传输,是字符设备的核心功能;

    4. llseek:支持调整读写位置,让设备可随机访问;

    5. owner:必设字段,绑定当前模块,防止模块被意外卸载;

  3. 未实现方法的默认行为

    ​​​​​​​scull 未实现的方法(如 pollmmapaio_read 等)会被自动设为 NULL,内核将按默认规则处理:

    1. 异步 I/O 操作会降级为同步的 read/write

    2. 调用 mmap 会返回 -ENODEV 错误;

    3. 调用 poll 会认为设备可无阻塞读写;


技术交流,欢迎加入社区:GPUers

评论 1
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

DeeplyMind

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值