第三章：字符设备驱动-12：read and write-优快云博客

In continuation of the previous text第三章：字符设备驱动-11：scull’s Memory Usage-1
, let's GO ahead.

read and write

The read and write methods both perform a similar task, that is, copying data from and to application code. Therefore, their prototypes are pretty similar, and it’s worth introducing them at the same time:

read 和 write 方法执行相似的任务 —— 从应用程序复制数据或向应用程序复制数据，因此它们的原型非常相似，值得同时介绍：

ssize_t read(struct file *filp, char __user *buff,
             size_t count, loff_t *offp);

ssize_t write(struct file *filp, const char __user *buff,
              size_t count, loff_t *offp);

For both methods, filp is the file pointer and count is the size of the requested data transfer. The buff argument points to the user buffer holding the data to be written or the empty buffer where the newly read data should be placed. Finally, offp is a pointer to a “long offset type” object that indicates the file position the user is accessing. The return value is a “signed size type”; its use is discussed later.

对这两个方法而言：

filp 是文件指针；
count 是请求的数据传输大小；
buff 指向用户缓冲区（写入时存放待写数据，读取时存放待读取数据）；
offp 是指向 “长偏移类型” 对象的指针，指示用户正在访问的文件位置；
返回值是 “有符号大小类型”（ssize_t），其用法将在后面讨论。

Let us repeat that the buff argument to the read and write methods is a user-space pointer. Therefore, it cannot be directly dereferenced by kernel code. There are a few reasons for this restriction:

• Depending on which architecture your driver is running on, and how the kernel was configured, the user-space pointer may not be valid while running in kernel mode at all. There may be no mapping for that address, or it could point to some other, random data.

• Even if the pointer does mean the same thing in kernel space, user-space mem ory is paged, and the memory in question might not be resident in RAM when the system call is made. Attempting to reference the user-space memory directly could generate a page fault, which is something that kernel code is not allowed to do. The result would be an “oops,” which would result in the death of the process that made the system call.

• The pointer in question has been supplied by a user program, which could be buggy or malicious. If your driver ever blindly dereferences a user-supplied pointer, it provides an open doorway allowing a user-space program to access or overwrite memory anywhere in the system. If you do not wish to be responsible for compromising the security of your users’ systems, you cannot ever derefer ence a user-space pointer directly.

需要重申的是，read 和 write 方法的 buff 参数是用户空间指针，因此内核代码不能直接解引用它。这一限制有以下几个原因：

取决于驱动运行的架构和内核配置，用户空间指针在 kernel 模式下可能完全无效 —— 可能没有该地址的映射，或指向其他随机数据；
即使该指针在 kernel 空间中含义相同，用户空间内存是分页的，系统调用执行时相关内存可能不在物理 RAM 中。试图直接引用用户空间内存可能引发页错误，而内核代码不允许这种情况发生（结果会导致 “oops”，进而终止发起系统调用的进程）；
该指针由用户程序提供，可能存在 bug 或恶意行为。如果驱动盲目解引用用户提供的指针，就会为用户空间程序打开一个 “后门”，使其能够访问或覆盖系统中的任意内存。若不想对用户系统的安全负责，就绝不能直接解引用用户空间指针。

Obviously, your driver must be able to access the user-space buffer in order to get its job done. This access must always be performed by special, kernel-supplied func tions, however, in order to be safe. We introduce some of those functions (which are defined in ) here, and the rest in the section “Using the ioctl Argument” in Chapter 1; they use some special, architecture-dependent magic to ensure that data transfers between kernel and user space happen in a safe and correct way. The code for read and write in scull needs to copy a whole segment of data to or from the user address space. This capability is offered by the following kernel functions, which copy an arbitrary array of bytes and sit at the heart of most read and write implementations:

显然，驱动必须能够访问用户空间缓冲区才能完成工作，但这种访问必须通过内核提供的特殊函数进行，以确保安全。我们在此介绍部分这类函数（定义于 <asm/uaccess.h>），其余将在第 1 章的 “使用 ioctl 参数” 小节中介绍。这些函数使用特定的架构相关机制，确保内核与用户空间之间的数据传输安全且正确。

scull 中的 read 和 write 代码需要在用户地址空间与内核之间复制整个数据段，以下内核函数提供了这种能力 —— 它们复制任意字节数组，是大多数 read 和 write 实现的核心：

 unsigned long copy_to_user(void __user *to,
                           const void *from,
                           unsigned long count);
 unsigned long copy_from_user(void *to,
                             const void _ _user *from,
                             unsigned long count);

Although these functions behave like normal memcpy functions, a little extra care must be used when accessing user space from kernel code. The user pages being addressed might not be currently present in memory, and the virtual memory sub system can put the process to sleep while the page is being transferred into place. This happens, for example, when the page must be retrieved from swap space. The net result for the driver writer is that any function that accesses user space must be reentrant, must be able to execute concurrently with other driver functions, and, in particular, must be in a position where it can legally sleep. We return to this subject in Chapter 5.

尽管这些函数的行为类似普通的 memcpy，但从内核代码访问用户空间时需要格外小心：被访问的用户页当前可能不在内存中，虚拟内存子系统可能会在页面换入时让进程进入睡眠（例如，当页面必须从交换空间中取回时）。

这对驱动开发者的影响是：任何访问用户空间的函数必须是可重入的，必须能与其他驱动函数并发执行，尤其必须处于 “可以合法睡眠” 的上下文。我们将在第 5 章回到这个主题。

补充说明:

__user 修饰符的作用

该修饰符是一种 “类型检查” 标记，告诉编译器和静态检查工具（如 sparse）：此指针指向用户空间内存，必须通过 copy_to_user/copy_from_user 等函数访问。若直接解引用，工具会报出警告，帮助开发者避免安全漏洞。
copy_to_user/copy_from_user 的返回值

两个函数的返回值是未成功复制的字节数：
1. 返回非 0 表示复制失败（如用户空间指针无效），此时驱动应返回 -EFAULT 错误码。
2. 返回 0 表示全部数据成功复制；
与用户空间数据传输的替代方案

除了这两个函数，内核还提供了其他工具：
1. copy_in_user(to, from, count)：在两个用户空间指针之间复制数据（较少使用）。
2. get_user(x, ptr)/put_user(x, ptr)：用于复制单个简单类型（如 int、char），效率高于批量复制函数；
睡眠上下文的限制

由于这两个函数可能导致睡眠，它们不能在中断上下文（如中断处理函数、软中断）中使用，只能在进程上下文（如 read、write 等系统调用处理过程）中调用。

The role of the two functions is not limited to copying data to and from user-space: they also check whether the user space pointer is valid. If the pointer is invalid, no copy is performed; if an invalid address is encountered during the copy, on the other hand, only part of the data is copied. In both cases, the return value is the amount of mem ory still to be copied. The scull code looks for this error return, and returns-EFAULT to the user if it’s not 0.

这两个函数（copy_to_user/copy_from_user）的作用不仅限于在用户空间与内核空间之间复制数据：它们还会检查用户空间指针是否有效。如果指针无效，则不会执行任何复制；如果在复制过程中遇到无效地址，则仅复制部分数据。在这两种情况下，返回值都是仍需复制的字节数。scull 代码会检查这个错误返回值，若不为 0，则向用户返回 -EFAULT 错误。

The topic of user-space access and invalid user space pointers is somewhat advanced and is discussed in Chapter 6. However, it’s worth noting that if you don’t need to check the user-space pointer you can invoke __copy_to_user and __copy_from_user instead. This is useful, for example, if you know you already checked the argument. Be careful, however; if, in fact, you do not check a user-space pointer that you pass to these functions, then you can create kernel crashes and/or security holes. As far as the actual device methods are concerned, the task of the read method is to copy data from the device to user space (using copy_to_user), while the write method must copy data from user space to the device (using copy_from_user).

用户空间访问和无效用户空间指针的话题较为深入，我们将在第 6 章详细讨论。但值得注意的是，如果你不需要检查用户空间指针，可以调用 __copy_to_user 和 __copy_from_user（带双下划线的版本）。例如，当你已经验证过参数有效性时，这会更高效。但务必小心：如果向这些函数传递未检查的用户空间指针，可能会导致内核崩溃或安全漏洞。

Each read or write system call requests transfer of a specific number of bytes, but the driver is free to transfer less data—the exact rules are slightly different for reading and writing and are described later in this chapter. Whatever the amount of data the methods transfer, they should generally update the file position at *offp to represent the current file position after successful comple tion of the system call. The kernel then propagates the file position change back into the file structure when appropriate. The pread and pwrite system calls have differ ent semantics, however; they operate from a given file offset and do not change the file position as seen by any other system calls. These calls pass in a pointer to the user-supplied position, and discard the changes that your driver makes.

Figure 3-2 represents how a typical read implementation uses its arguments.

每个 read 或 write 系统调用都会请求传输特定数量的字节，但驱动可以自由选择传输更少的数据 —— 读写操作的具体规则略有不同，本章后续会详细说明。无论方法传输了多少数据，通常都应更新 *offp 所指的文件位置，以表示系统调用成功完成后的当前文件位置。内核随后会在适当时机将文件位置的变化同步回 file 结构。

不过，pread 和 pwrite 系统调用的语义不同：它们从用户指定的偏移量开始操作，且不会改变其他系统调用所看到的文件位置。这些调用会传入一个指向用户提供的位置的指针，并且会忽略驱动对该位置的修改。

图 3-2 展示了典型的 read 实现如何使用其参数（示意图）。

Both the read and write methods return a negative value if an error occurs. A return value greater than or equal to 0, instead, tells the calling program how many bytes have been successfully transferred. If some data is transferred correctly and then an error happens, the return value must be the count of bytes successfully transferred, and the error does not get reported until the next time the function is called. Imple menting this convention requires, of course, that your driver remember that the error has occurred so that it can return the error status in the future.

Although kernel functions return a negative number to signal an error, and the value of the number indicates the kind of error that occurred (as introduced in Chapter 2), programs that run in user space always see–1 as the error return value. They need to access the errno variable to find out what happened. The user-space behavior is dic tated by the POSIX standard, but that standard does not make requirements on how the kernel operates internally.

read 和 write 方法若发生错误，会返回一个负值；若返回值非负，则表示成功传输的字节数。如果部分数据传输成功后发生错误，返回值必须是成功传输的字节数，而错误会推迟到下一次函数调用时才报告。要实现这一约定，驱动当然需要记录已发生的错误，以便在后续调用中返回错误状态。

尽管内核函数通过返回负数表示错误（且数值本身指示错误类型，如第 2 章所述），但用户空间程序总会将错误返回值视为 -1。用户程序需要通过访问 errno 变量来确定具体错误原因。这种用户空间行为由 POSIX 标准规定，但该标准并不限制内核内部的运作方式。

补充说明:

copy_*_user 与 __copy_*_user 的核心差异

函数	安全检查	适用场景	风险
`copy_to_user`/`copy_from_user`	自动检查用户指针有效性（如地址是否属于用户空间、是否可访问）	大多数场景，尤其是直接使用用户传入的指针时	无（检查失败返回非 0，驱动可处理）
`__copy_to_user`/`__copy_from_user`	不进行任何检查，直接复制	已通过其他方式验证指针有效性的场景（如提前调用 `access_ok`）

文件位置更新的规则
- 普通 read/write：必须更新 *offp，使下次调用从新位置开始（如读取 100 字节后，*offp 增加 100）；
- pread/pwrite：*offp 是用户传入的临时位置，驱动可使用但无需更新（内核会忽略修改）；
- 示例：scull_read 中读取 n 字节后，需执行 *offp += n。
部分传输的处理原则

驱动允许传输少于请求的字节数（count），常见原因包括：
- 内核内存限制（如无法分配足够缓冲区）。此时驱动应返回实际传输的字节数（而非错误），用户程序会根据返回值判断是否需要继续读取。
- 非阻塞模式下数据未准备好；
- 设备数据不足（read 时已到达末尾）；
错误码的传递机制

内核中定义的错误码（如 -EFAULT、-ENOMEM）会被系统调用层转换为用户空间的 errno 值：
- 用户程序通过 perror 或 strerror 可将 errno 转换为可读性错误信息（如 "Bad address"）。
- 例如，驱动返回 -EFAULT 时，用户程序的 read/write 调用会返回 -1，同时 errno 被设为 EFAULT（数值为 14）；