【Linux】基础I/O——文件描述符fd，重定向dup2，Linux下一切皆文件

本文链接：https://blog.youkuaiyun.com/2301_80224556/article/details/139689037

1.文件描述符fd——本质是数组的下标

磁盘文件 VS 内存文件？

当文件存储在磁盘当中时，我们将其称之为磁盘文件，而当磁盘文件被加载到内存当中后，我们将加载到内存当中的文件称之为内存文件。磁盘文件和内存文件之间的关系就像程序和进程的关系一样，当程序运行起来后便成了进程，而当磁盘文件加载到内存后便成了内存文件。

进程想要访问文件必须先打开文件，一个进程可以打开多个文件，而系统当中又存在大量进程，也就是说，在系统中任何时刻都可能存在大量已经打开的文件，已经打开的文件会被加载到了内存中，这些文件也叫内存文件，反之，没有打开的文件就叫做磁盘文件。那么操作系统就要管理这些打开的文件。

如何管理就是先描述，再组织。操作系统为每个已经打开的文件创建各自的struct file结构体，然后将这些结构体以双链表的形式连接起来，那么操作系统对文件的管理也就变成了对这张双链表的增删改查等操作，在每个节点中不仅有链表的指针，还应该存在着文件的内容+属性，这些信息大部分在磁盘中就保留在文件内部了，加载的时候就从磁盘中把数据加载到内存。

而为了区分已经打开的文件哪些属于特定的某一个进程，我们就还需要建立进程和文件之间的对应关系。

进程和文件之间的对应关系是如何建立的？

当进程运行的时候，操作系统会将该程序的代码和数据加载到内存，然后创建对应的task_struct, mm_struct, 页表等…

我们知道打开文件肯定需要将文件先组织在描述

将文件描述成下面这个样子

文件是通过进程打开的，那么有的进程会打开很多文件，那么进程必须要对这些打开的文件进行管理，进程PCB里存在一个struct files_struct*files，这个之后会指向一个结构体struct files_struct，这个结构体里有一个数组，数组名字叫struct file*fd_arry[]，里面每个元素都指向一个struct file(上图)。

这个数组的下标就是文件描述符fd

使用read和write的时候要传入文件描述符，通过文件描述符找到这个数组中的指针，进而对文件访问。

当进程打开log.txt文件时，我们需要先将该文件从磁盘当中加载到内存，形成对应的struct file，将该struct file连入文件双链表，并将该结构体的首地址填入到fd_array数组当中下标为3的位置，使得fd_array数组中下标为3的指针指向该struct file，最后返回该文件的文件描述符给调用进程即可。

因此，我们只要有某一文件的文件描述符，就可以找到与该文件相关的文件信息，进而对文件进行一系列输入输出操作。

注意：向文件写入数据时，是先将数据写入到对应文件的缓冲区当中，然后定期将缓冲区数据刷新到磁盘当中。

1.1.文件描述符的分配规则

下面我们看一段代码：

  #include<stdio.h>  
  #include<string.h>                                                                                                                              
  #include<unistd.h>
  #include<sys/types.h>
  #include<sys/stat.h>
  #include<fcntl.h>
  int main()
  {              
       int fd1=open("./log1.txt",O_WRONLY|O_CREAT,0644);
       int fd2=open("./log2.txt",O_WRONLY|O_CREAT,0644);
       int fd3=open("./log3.txt",O_WRONLY|O_CREAT,0644);
       int fd4=open("./log4.txt",O_WRONLY|O_CREAT,0644);
       printf("%d\n",fd1);
       printf("%d\n",fd2);
       printf("%d\n",fd3);
       printf("%d\n",fd4);
       close(fd1);
       close(fd2);
       close(fd3);
       close(fd4);
  
     return 0;
 }

我们发现怎么fd从3开始。现在我们在将0，2关了，我们再来看结果会是如何。

  #include<stdio.h>  
  #include<string.h>                                                                                                                              
  #include<unistd.h>
  #include<sys/types.h>
  #include<sys/stat.h>
  #include<fcntl.h>
  int main()
  {              
           close(0);
           close(2);
       int fd1=open("./log1.txt",O_WRONLY|O_CREAT,0644);
       int fd2=open("./log2.txt",O_WRONLY|O_CREAT,0644);
       int fd3=open("./log3.txt",O_WRONLY|O_CREAT,0644);
       int fd4=open("./log4.txt",O_WRONLY|O_CREAT,0644);
       printf("%d\n",fd1);
       printf("%d\n",fd2);
       printf("%d\n",fd3);
       printf("%d\n",fd4);
       close(fd1);
       close(fd2);
       close(fd3);
       close(fd4);
  
     return 0;
 }

我们发现0和2也被用起来了。

我们把1关了试试看

  #include<stdio.h>  
  #include<string.h>                                                                                                                              
  #include<unistd.h>
  #include<sys/types.h>
  #include<sys/stat.h>
  #include<fcntl.h>
  int main()
  {              
          close(1);
       int fd1=open("./log1.txt",O_WRONLY|O_CREAT,0644);
       int fd2=open("./log2.txt",O_WRONLY|O_CREAT,0644);
       int fd3=open("./log3.txt",O_WRONLY|O_CREAT,0644);
       int fd4=open("./log4.txt",O_WRONLY|O_CREAT,0644);
       printf("%d\n",fd1);
       printf("%d\n",fd2);
       printf("%d\n",fd3);
       printf("%d\n",fd4);
       close(fd1);
       close(fd2);
       close(fd3);
       close(fd4);
  
     return 0;
 }

屏幕不显示了？？？？

现在我们就明白了文件描述符的分配规则是从最小的未被使用的下标开始的

事实上

Linux下每个进程默认会打开三个文件描述符，0：标准输入、1：标准输出、2：标准错误。

0，1，2对应的物理设备一般是：键盘、显示器、显示器。

我们之前验证了文件描述符默认是从3开始的，也就是说每个进程的0，1，2是默认被打开的。

0代表的是标准输入流，对应硬件设备为键盘；
1代表标准输出流，对应硬件设备是显示器；
2代表标准错误流，对应硬件设备为显示器。

当一个进程被创建时，OS就会根据键盘、显示器、显示器形成各自的struct file，将这3个struct file链接到文件的双链表当中，并将这3个struct file的地址分别填入fd_array数组下标为0、1、2的位置，至此就默认打开了标准输入流、标准输出流和标准错误流。

我们可以通过write系统调用接口来验证一下

#include <stdio.h>
#include<unistd.h>
#include<string.h>
int main()
{
	const char* msg = "hello linux";
	write(1, msg, strlen(msg));
}

直接打印到屏幕来了！！！

我们再看看2

#include <stdio.h>
#include<unistd.h>
#include<string.h>
int main()
{
	const char* msg = "hello linux";
	write(2, msg, strlen(msg));
}

我们还可以使用read来验证一下

#include <stdio.h>
#include<unistd.h>
#include<string.h>
int main()
{
	char buffer[1024];
	read(0, buffer, sizeof(buffer));
	printf("echo : %s\n", buffer);
}

运行后堵住了，为什么？因为0号是对应标准输入流——键盘，你没有输入，就不会显示，我们输入一些东西看看

默认打开3个流可不是c语言的特性，是操作系统的特性！！！！

操作系统默认打开3个流，所有编程语言也得跟着一样设计，不同的编程语言都得按照操作系统的设置来设计，所有文件操作都必须与fd挂钩

文件描述符的分配规则：从0下标开始，寻找最小的，没有被占用的文件描述符，它的下标就是新文件的文件描述符。

如果我把0号关闭，那么为新文件分配的时候就从最小的0分配。

2.重定向

2.1.输出重定向

我们看段代码

#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <stdlib.h>

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

#define filename "log.txt"

int main()
{
	int fd = open(filename,O_CREAT|O_WRONLY|O_TRUNC,0666);//一定要可写
	if (fd < 0)//fd是数组下标，不可能小于0
	{
		perror("open failed\n");
		return 1;
	}

	
	const char* msg = "hello linux\n";
	int cnt = 5;
	while(cnt)
	{
		write(1, msg, strlen(msg));//1就是显示器，strlen不用加1
		cnt--;
	}

	close(fd);
	return 0;
}

我们再看看

#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <stdlib.h>

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

#define filename "log.txt"

int main()
{
	close(1);//把显示器关了
	int fd = open(filename,O_CREAT|O_WRONLY|O_TRUNC,0666);
	if (fd < 0)//fd是数组下标，不可能小于0
	{
		perror("open failed\n");
		return 1;
	}

	
	const char* msg = "hello linux\n";
	int cnt = 5;
	while(cnt)
	{
		write(1, msg, strlen(msg));//1就是显示器，strlen不用加1
		cnt--;
	}

	close(fd);
	return 0;
}

！！！！！！！！！！！！！是不是非常震惊！！

这个是因为我们关闭了1号文件描述符，然后根据文件描述符的分配规则，1号文件描述符就成了log.txt的文件描述符，所以东西都写到log.txt里面去了

这个情况就叫输出重定向

那他的原理是什么

上面就说过，0、1、2默认是被打开的，对应的就要打开显示器，所以stdout的文件描述符就是1，所以系统调用接口write认识的就是stdout或者说就是1，我们一开始就关闭了1号文件描述符，把数组下标为1的位置设置为NULL，然后打开了log.txt文件，此时1没有被占用，所以就把下标为1的位置填入log.txt的结构体的地址，log.txt的文件描述符就是1了，但是上层的系统调用write认识的还是1，他们还是继续往1中写入，这样就不能打印到屏幕而是重定向到了文件中。

重定向的本质是在操作系统中更改fd对应的内容，上面演示的这就就叫做输出重定向。

2.2.输入重定向

输入重定向就是，将我们本应该从一个键盘上读取数据，现在重定向为从另一个文件读取数据。

我们看个例子

我们先打开log.txt，写入下面这些东西

我们再运行这个代码

#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <stdlib.h>

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int main()
{
    close(0);
    // 打开文件
    int fd = open("log.txt", O_RDONLY);//这里一定要可读
    if (fd < 0)
    {
        perror("open failed\n");
        exit(1);
    }
 
    char buffer[14];
   ssize_t s=read(0, buffer, sizeof(buffer)-1);
    if (s > 0)
    {
	    buffer[s] = '\0';
	    printf("echo : %s\n", buffer);
    }
 
 
    // 关闭文件
    close(fd);
 
    return 0;
}

我们的fget函数是从标准输入读取数据，现在我们让它从log.txt当中读取数据，我们在scanf读取数据之前close(0).这样键盘文件就被关闭，这样一样log.txt的文件描述符就是0.

关闭了0号文件描述符，所以打开的新文件log.txt的文件描述符就变成了0，然后读取了文件中的数据。

2.3.追加重定向

还有一种就是追加重定向，更改一下选项就行了。

#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <stdlib.h>

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int main()
{
    close(1);
    // 打开文件
    int fd = open("log.txt", O_WRONLY | O_APPEND);//一定要可写
    if (fd < 0)
    {
        perror("open");
        exit(1);
    }
    write(1,"hhh\n",strlen("hhh\n")); 
    
    // 关闭文件
    close(fd);
 
    return 0;
}

【注意】：“>”输出重定向修改的只是1号也就是stdout标准输出，所以尽管程序中有两行代码，一行向1号文件描述符中打印，另一行向2号文件描述符中打印，那么使用输出重定向只会使1号文件描述符重定向，2号还是打印到显示器上。

2.4.dup2

我们发现我们上面只能通过close关闭对应的文件描述符实习对应的输出重定向和输出重定向，那我们能不能不关闭呢？

要完成重定向我们只需对fd_array数组当中元素进行拷贝即可。

例如，我们若是将fd_array[3]当中的内容拷贝到fd_array[1]当中，因为C语言当中的stdout就是向文件描述符为1文件输出数据，那么此时我们就将输出重定向到了文件log.txt。而在linux当中就给我们提供了这个系统调用：

函数功能： dup2会将fd_array[oldfd]的内容拷贝到fd_array[newfd]当中。
函数返回值：调用成功返回0，失败返回-1

使用的过程中需要注意：

如果oldfd不是有效的文件描述符，则dup2调用失败，并且此时文件描述符为newfd的文件没有被关闭。
如果oldfd是一个有效的文件描述符，但是newfd和oldfd具有相同的值，则dup2不做任何操作，并返回newfd。

我们看个例子

下面通过dup2演示一下前面的输出重定向：

 #include<stdio.h>
 #include<sys/types.h>
 #include<sys/stat.h>
 #include<unistd.h>
 #include<fcntl.h>                                                                                                                               
 int main()
 {
   int fd=open("./log.txt",O_WRONLY|O_CREAT,0644);
    dup2(fd,1);
  printf("hello world\n");//本来应该打印到屏幕的
  printf("hello world\n");
 
 }

看懂了吗？

只需要把想要重定向的文件在数组中拷贝过去，比如我想要输出重定向，重定向到某个文件，那么1就代表标准输出，所以就要改变1的指向，就把3的地址拷贝过去，这样1就指向了重定向的文件。

输入重定向也是一样的，0是标准输入，就要从其他文件输入，就把其他文件的地址拷贝到0的位置。

嗯？按照我们的说法，那么3号文件描述符也对应log.txt了？

我们来验证一番

 #include<stdio.h>
 #include<sys/types.h>
 #include<sys/stat.h>
 #include<unistd.h>
 #include<fcntl.h>
#include<string.h>
                                                                                                                               
 int main()
 {
   int fd=open("log.txt",O_WRONLY|O_CREAT,0644);
    dup2(fd,1);
  printf("hello world\n");//本来应该打印到屏幕的
  printf("hello world\n");
 
  write(3,"gun",strlen("gun"));
 }

还真是这样子

为了安全起见呢我们使用dup2的时候一般要把旧的文件关闭掉

 int fd=open("log.txt",O_WRONLY|O_CREAT,0644);
    dup2(fd,1);
    close(fd);//关闭旧的文件
  ……

我们接下来看看追加重定向

#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <stdlib.h>

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

#define filename "log.txt"

int main()
{
	int fd = open(filename,O_CREAT|O_WRONLY|O_APPEND,0666);//一定要可写
	if (fd < 0)//fd是数组下标，不可能小于0
	{
		perror("open failed\n");
		return 1;
	}

	//printf("fd : %d\n", fd);
	dup2(fd,1);
	close(fd);

	const char* msg = "hello linux\n";
	int cnt = 5;
	while(cnt)
	{
		write(1, msg, strlen(msg));
		cnt--;
	}

	close(fd);
	return 0;
}

怎么样？？？很强吧

我们再来看看输入重定向

#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <stdlib.h>

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

#define filename "log.txt"

int main()
{

	int fd = open(filename,O_RDWR);//注意这里一定要可读
	if (fd < 0)//fd是数组下标，不可能小于0
	{
		perror("open failed\n");
		return 1;
	}

	dup2(fd, 0);
	close(fd);

	char buffer[64];
	ssize_t s=read(0, buffer, sizeof(buffer)-1);
	if (s > 0)
	{
		buffer[s] = '\0';
		printf("echo : %s\n", buffer);
	}
	
	return 0;
}

log.txt里面的东西如下

运行一下

怎么样呢？是不是很神奇？

2.5.重定向的本质

重定向的本质，其实是在OS内部，更改fd对应的内容的指向！！

2.6.c语言接口的重定向

#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <stdlib.h>

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

#define filename "log.txt"

int main()
{
	int fd = open(filename,O_CREAT|O_WRONLY|O_APPEND,0666);
	if (fd < 0)//fd是数组下标，不可能小于0
	{
		perror("open failed\n");
		return 1;
	}
	printf("fd : %d\n", fd);
	printf("hello printf\n");
	fprintf(stdout, "hello fprintf\n");

}

printf和fprintf都是往屏幕打印的

我们试试重定向

#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <stdlib.h>

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

#define filename "log.txt"

int main()
{
	int fd = open(filename,O_CREAT|O_WRONLY|O_APPEND,0666);
	if (fd < 0)//fd是数组下标，不可能小于0
	{
		perror("open failed\n");
		return 1;
	}
	dup2(fd, 1);
	close(fd);

	printf("fd : %d\n", fd);
	printf("hello printf\n");
	fprintf(stdout, "hello fprintf\n");

}

很好啊printf和fprintf输出的内容都来了文件里

2.7.>,>>,<,<< 和重定向

我们之前学过>重定向，不清楚的可去这里看看：http://t.csdnimg.cn/ajI9G

我们现在的问题就是现在这个>，>>的重定向和我们今天学的有什么关系？

这个>,>>,<,<<的底层原理就是这个dup

3.如何理解linux下一切皆是文件

到现在我们对进程已经有了一个新的认知

进程替换时，是否会干扰重定向对应的数据结构？

它们当然不会互相影响。

换而言之，将来 fork 创建子进程，子进程会以父进程的大部分数据为模板，子进程进行程序替换时并不会影响曾经打开的文件，也就不会影响重定向对应的数据结构。

如何理解Linux下一切皆是文件？

在操作系统看来，这些个外设设备，其实都是文件。我们要想访问一个外设，在操作系统看来，其实我们就是想访问一个文件；所以访问文件的操作其实都是通过进程的方式去访问的。

将来，用户使用进程想打开某一个外设之时，操作系统就是以打开文件的方式给我们打开这个外设的，也就是使用 open（）之类的函数来打开这个文件。

操作系统给每一个打开的文件，都创建一个数据结构 -- file_struct 。

那么，这些文件被用户进行读写操作，实际上就是要调用这个设备的读写方法（函数），Linux 如何对上述情况进行解释呢？

其实，在 Linux 当中使用一个了数据结构 -- struct operation_func 。这个结构体。这个结构体的地址，一般在文件对象（file_struct)当中就有保存，在这个结构体（struct operation_func ）当中存储了这个设备的各种对应方法（函数）的函数指针。

以后，只要是要打开一个文件，除了会创建这个文件的文件结构体对象（file_struct)之外，还会创建一个方法集合（函数集合）的结构体(struct operation_func )，在这个结构体当中存储的是这个外设的各种对应方法（函数）的函数指针。

通过 struct operation_func 这个结构体当中的函数指针，找到这些外设的各种方法，所以我们把这个结构体称之为方法集。

然后，进程通过自己的文件描述符表，当中保存的文件对象首地址，找到各个被打开的文件的文件对象。在这个文件对象(file_struct)当中就存储了一个类似 *f_ops 的指针，指向这个文件的方法集 -- struct operation_func 结构体。

然后就能进行各种对于外设的操作了。

        所以，对于进程来说，也有read（）和 write（）之类的函数接口，在这个接口当中，通过指针的调用，来访问到底层外设的方法：

         各个文件对象当中的各个方法，都有这样的类似的调用接口，这样的话，实现不同指针调用方式，就可以调用不同的外设硬件的方法，从而操作这些外设。

        那么，在用户看来，就只调用了这一个 read （）函数，就完成了对外设的访问，这不就和在代码当中访问文件的方式是一样的吗？

举个例子，如果网卡这个设备先要被打开的话，那么必须要给操作系统提供自己的驱动程序，也就是各种方法。然后操作系统为这个网卡创建这个网卡的文件对象 -- struct_file。

同样，网卡也有自己的方法指针集。这些的结构体对象，按照上述我们说过的方式连接起来，将来，某一个进程像打开网卡访问，只需要调用封装好的方法，在这些方法当中其实就是一些指针的调用。

用户看来也是文件。

其实，从上层文件对象 -> 方法集合 -> 外设的方法调用 -> 不同的外设。这个过程其实就类似继承的关系，文件对象就是基类，往后的方法集合，各种外设其实都可以看做是这个文件对象的派生类。

在用户从基类（文件对象或者是进程）看来，都是在访问一个文件，但是访问不同文件，可以调用到不同的外设，调用到不同的外设方法，这不就是多态吗？

3.1.总结

总结一下，什么叫一切皆文件？

其实就是，操作系统帮我们在文件层面，封装了一层类似 struct_file 这样的文件结构体对象，文件当中有一个指针（类似 *f_ops 的指针），指向另一个结构体（struct operation_func ），在这个结构体当中存储了不同设备的各种方法的地址，这个结构体就是方法集（struct operation_func ）。利用函数指针的方式，来间接的把底层外设硬件的各种方法进行汇总；