Java IO 与 NIO

最新推荐文章于 2025-03-11 08:33:11 发布

原创最新推荐文章于 2025-03-11 08:33:11 发布 · 160 阅读

0 ·

CC 4.0 BY-SA版权

本文详细探讨了Java中的IO与NIO的区别，包括文件IO、标准IO、非阻塞IO、MMAP及多路复用等概念，对比了不同IO模式的特点与应用场景。

了解java 中的IO与NIO的区别，最好是去了解操作系统中的IO，以Unix为例

参考书籍《Unix高级环境编程卷1》

文件IO

Unix中存在文件IO，主要函数是

open，打开一个文件，返回一个FD(file description)

read，读取一个或者多个byte

write，写入一个或者多个byte

在文件IO中，read和write方法都是系统调用，数据从IO设备传送到cpu，再拷贝到用户空间

open函数在fnctl.h，而read/write函数都是unistd.h，unistd.h是Unix系统API的库文件。

标准IO

文件IO中没有用到缓存，标准IO中用到了缓存，相比文件IO，对性能更友好。

标准IO也被称为Stream IO，而文件IO被称为无缓存IO。

标准IO的常见函数都在stdio.h，有：

fopen，getc/putc，fgetc/fputc，fflush等等。随便谷歌了一个stdio.c源文件（glibc实现的）看看。

比如fopen，会先malloc一段内存空间，然后一般的读写都是操作这块内存。

之后如果调用fflush之类的flush函数，才会调用文件IO中的read/write，将内存中的数据读/写到文件中。

标准IO相比文件IO，首先标准IO的实现依赖于文件IO。

文件IO函数等于一次系统调用，而标准IO函数不一定是系统调用。普通函数比系统调用使用更少的CPU时间。

问题： java中的FileOutputStream中的write方法是文件IO还是标准IO？

通过对FileOutputStream的本地方法write(int)在openjdk的源码中一阵搜索，基本判断是此write方法直接调用了文件IO中的write方法（系统调用）。这也解释了为什么在java中一般要用buffered IO。

标准IO的flush函数与系统调用sync/fsync

flush/fflush 函数是标准IO库中的函数，作用是将Stream的数据调用文件IO方法，传输到操作系统，让操作系统负责数据到IO设备的操作。

操作系统收到系统调用后，不一定马上就将数据通过bus发送到IO设备，操作系统有自己的IO缓存机制，在合适的时候才会发送到IO设备。如果想手动让数据马上到达IO设备，就可以使用sync / fsync系统调用。

高级IO

高级IO的实现依赖于文件IO，主要内容是非阻塞IO，MMAP，多路(Select/Poll)等

非阻塞IO

文件IO中，打开文件的操作（注意这里不是标准IO）

#include <sys/stat.h> 
#include <fcntl.h>

int open(const char *path, int oflag, ... );

// oflag中有一项是O_NONBLOCK

O_NONBLOCK
When opening a FIFO with O_RDONLY or O_WRONLY set:
*
If O_NONBLOCK is set, an open() for reading-only shall return without delay. An open() for writing-only shall return an error if no process currently has the file open for reading.
*
If O_NONBLOCK is clear, an open() for reading-only shall block the calling thread until a thread opens the file for writing. An open() for writing-only shall block the calling thread until a thread opens the file for reading.

用NON-BLOCK模式打开文件后，read/write操作都是非阻塞的，也就是马上有结果。

因此Non-block模式一般与for循环一起使用，不过for太多就会浪费cpu。对于这个问题，可以使用多路IO。

至于标准IO，在

https://android.googlesource.com/platform/bionic/+/ics-mr0/libc/stdio/flags.c

中，没有看到标准IO的API中有NON-BLOCK模式。

MMAP

Memory-mapping IO。

详细的说明在：

http://man7.org/linux/man-pages/man2/mmap.2.html

       #include <sys/mman.h>

       void *mmap(void *addr, size_t length, int prot, int flags,
                  int fd, off_t offset);
       int munmap(void *addr, size_t length);

       See NOTES for information on feature test macro requirements.

内存映射IO的知识在《现代操作系统》中有说明。两次复制减少为一次复制。

Another way of looking at it is that when you use read(), the kernel first reads the data into the filesystem cache, and then copies it from there into userspace. That means the data physically exists in two places. When you mmap() a file, you're sharing the same pages as the filesystem cache (i.e. those same physical pages are mapped into two different places), so there's only one copy.

多路IO(select/poll)

多路IO意味着application要处理多个fd，比如TCP协议中，一个TCP 终端同时存在输入cache和输出cache，因为TCP是全双工的。这时候使用阻塞IO是不行的，因为如果application阻塞在一个fd上，而此时另一个fd来了数据，application被阻塞了没法处理。

非阻塞IO也不太好，需要循环使用系统调用判断fd是否有数据。

多路IO模式是与阻塞IO和非阻塞IO不同的第三种模式。

以select举例，

int select (int maxfdp1, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval * tvptr)

参数说明：

maxfdp1， readfds/writefds/exceptfds 三个数组中最大的数组长度（方便写for循环去循环3个数组）

readfds，如果有可读的fd准备好了，会放入这个数组

writefds，如果有可写的fd准备好看，会放入这个数组

exceptfds，如果有异常的fd，会放入这个数组

tvptr，最大阻塞等待时间。可设置为一直等到至少有一个fd准备好了。

使用select函数，如果已到了最大阻塞等待时间，或者等到了至少一个fd准备好了（需要tvptr相应的设置），则函数返回。

具体可以参考：

https://notes.shichao.io/unp/ch6/#chapter-6-io-multiplexing-the-select-and-poll-functions

select 与 poll的区别于联系：它们都是多路IO模式的系统调用；select源自BSD，poll源自SystemV。