linux------------------Files

最新推荐文章于 2025-08-14 15:59:44 发布

thiswherewhatwho

最新推荐文章于 2025-08-14 15:59:44 发布

阅读量497

点赞数

CC 4.0 BY-SA版权

文章标签： file filesystems system unix access descriptor

本文链接：https://blog.youkuaiyun.com/thiswherewhatwho/article/details/6430747

本文深入解析了Unix文件系统的结构和工作原理，包括文件链接、权限设置及系统调用等内容，并探讨了虚拟内存的概念及其作用。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

A Unix files is an information container structured as a sequence of bytes; the kernel does not interpret the contents of a file.

Unix文件是一列字节组成的信息载体(container)，内核不解释文件的内容

Many programming libraries implements higher-level abstractions, such as recoreds structured into fields and record addressing based on keys.

很多编程在库函数实现了根高级别的抽象，例如有域构成的记录以及基于关键字编址的的记录。

Howerever, the programs in these libraries must rely on system calls offerred by the kernel.

From the user's point of view, files are organized in a tree-structured namespace, as show in Figure1-1

All the nodes of the tree, except the leaves, denote directory names. A directory node contains informaction about the files and directories just

beneath it.

除了叶子节点，树的所有节点都表示目录名，目录节点包含了它下面文件以及目录的所有信息。

A file or directory name consists of a sequence of arbitrary ASCII characters, with the exception of / and of the null character /0.

Most filesystems place a liimit on the length of a filename, typically no more than 255 characters.

The directory corresponding to the root of the tree is called the root directory.

By convention, its name is slash(/). Names must be differrent within the same directory, but the same name may be used in different directores

Unix associates a current working directory with each process(see the section "The process/kernel Model" later in this chapter); it belongs to theprocess execution context, and it identifies the directory currently used by the process. To indentify a specific file, the process uses a

pathname, which consists of slashes alternating with a sequence of directory names that lead to the file.

If the first item in the pathname is a slash, the pathname is said to be absolute, because its starting point is the root directory.

Otherwise, if the first item is a directory name or filename, the pathname is said to be relative, because its starting point is the process's current directory.

While specifying filenames, the notations "." and ".." are also used. The denote the current working directory and its parent directory, respectively.

If the current working directory is the root directory, "." and ".." conincide.

---------------------------------------------------------------------------------------------------------------

HARD AND SOFT LINKS

A filename included in a directory is called a file hard link, or more simply, a link.

The same file may have several links included in the same directory or in different ones, so it may have serveral filenames

The unix command:

$ln p1 p2

is used to create a new hard link that has the pathname p2 for a file identified by the pathname p1.

Hard links have two limitations:

.It is not possible to create hard links for directores.

Doing so might transform the directory into a graph with cycles, thus making it impossible to locate a file according to its name

.Links can be created only among files included in the same filesystem,

This is a serious limitation, because modern Unix systems may include serveral filesystems located on

different disks and / or partions, and users may be unaware of the physical divisions beween them

To overcome thesse limitations, soft links (also called symbolic links0 were introduced a long time ago.

Symbolic links are short files that contain an arbitrary pathname of another file.

The pathname may refer to any file or directory located in any filesystem;

it may even refer to a nonexistent file.

The unix command;

$ln -s p1 p2

create a new soft link with pathname p2 that refers to pathname p1.

When this command is executed, the filesystem extracts the directory part of p2 and create a new entry in that directory of type symbolic link, with the name indicated by p2.

This new file contains the name indicated by pathname p1.

This way, each reference to p2 can be translated automatically into a reference to p1.

FILE TYPES

Unix files may have one of the following types

.Regular file

.Directory

.Symbolic link

.Block_oriented device file

.Pipe and named pipe(also called FIFO)

.Socket

The first three file type are constituents of any Unix filesystem. Their implementation is described in detail in

Device files are related both to I/O devices, and to device drivers integrated into the kernel, For example, when a program access a device a device file, it acts direcly on the I/O device assocated with that file (see chapter 13)

Pipes and socket are special files used for interprocess communication(see the section "Synchronization and critial Regions" later in this chapter; also see chapter 19

FILE DESCRIPTION AND INDOE

Unix makes a clear distication between the contents of a file and information about a file.

With the exception of device files and files of special filesystems, each file consists of a sequence of bytes.

The file does not include any control information, such as its lenght or an end-of-file(EOF) delimiter

All information needed by the filesystem to handle a file is included in a data structure called an innode.

Each file has its own inode, which the filesystem uses to indentify the file.

While filesystems and the kernel functions handling them can vary widely from one Unix system to another, they must always provide at least the following attributes, which are specified in the POSIX standard:

.File type(see the previous section)

.Number of hard links associated with file

.File length in bytes

.Device ID(i.e, an identifier of the device containg the file)

.Inode number that identifies the file within the filesystem

.UID of the file owner

.Use groud Id of the file

.Serveral timestamps that specify the indode status change time, the last access time, and the last modify time

.Access rights and file mode(see the next section)

ACCESS RIGHTS AND FILE MODE

The potential users of a file fall into three classes:

.The user who is the owner of the file

.The user who belong to the same group as the file, not include the owner

.All Remaing users(others)

Theere are three type of access right

--read, write, and execute for each of these three classes.

Thus, the set of access rights associated with a file consists of nine different binary flags.

Three additional flags, called suid(Set User Id), sgid(set groud id), and sticky, define the file mode.

These flags have the following meanings where applied to executable files:

suid

A process executing a file normally keeps the user id of the process owner.

However, if the executable file has the suid flag set, the process gets the uid of the owner

sgid

A process executing a file keeps the user groud id of the process group.

However, if the executable file has the sgid flag set, the process gets the user group id of the file.

sticky

An executable file with the sticky flag set corresponds to a request to the kernel to keep the program in memory after its execution terminates.

When a file is created by a process, its owner id is the uid of the process; its owner user group id can be either the process id of the createor process or the user group id of the parent directory, depeing on the value of the sgid flag of the parent directory.

----------------------------------------------------------------------------------------------------------------------------------------

chmod u+s 1.sh

给1.sh文件加上setuid标志，setuid只是对文件有效的

chmod g+s 1.sh

给文件1.sh加上setgid标志

chmod g+s sh

给目录sh加上setgid标志，

setgid对目录和文件有效

chmod o+t 1.sh

给文件1.sh加上sticky标志， sticky只是对文件有效

------------------------------------------------------------------------------------------------------------------------------------------

FILE-HANDING SYSTEM CALLS

When a user accesses the contents of either a regular file or a directory, he actually accesses some data stored in a hardware block device.

In this sense, a filesystem is a user-level view of the physical organization of a hard disk partion.

Beacause a process in User Mode cannot directly interact with the low-level hardware components, each actual file operation must performed in kernel mode.

Therefore, the unix operation system defines serveral system calls related to file handing.

All unix kernel devote greate attention to the efficient handing of hardware block devices to achieve good overall system performance.

In the chapter that follow, we will describe topics related to file handing in linux and specifically how the kernel reacts to file-related system calls.

To understand those descriptions, you will need to know how the main file-handing system calls are used; these are described in the next section.

Processes can access only "opend" files.

To open a file, the process invokes the system call:

fd = open(path, flag, mode)

The three parameter have the following meanings:

path

Denotes the pathname(relative or absollute) of the file to be opened.

flag

Specified how the file must be opend(e.g, read, write, read/write, append).

It also can specify wheather a nonexisiting file should be created

mode

Specifies the access rights of a newly create file

This system call creates an "open file" object an return an identifier called a file descriptor.

An open file object contains:

.Some file-handing data structures, such as a set of flags specifying how the file has been opened,

an offset field that denotes the current position in the file from which the next operation will take place(the so-called file pointer), and so on

.Some pointer to kernel functions that the process can invoke.

The set of permitted functions depends on the value of the flag parameter

We discuss open file objects in detail in chapter 12, Let's limit ourselves here to describing some general properties specified by the POSIX semantics

.A file descriptor represents an interaction between a process and an opened file, while an open file object contains data related to that

interaction.

The same open file object may be identified by serveral file descriptions in the same process.

.Serveral processes may concurrently open the same file.

In this case, the fielsystem assigns a separate file descriptor to each file, along with a separate open file object.

When this occurs, the UNIX filesystem does not provie any kind of synchronization among the I/O operations issued by the processes on the same file.

However, serveral system calls such as flocl() are available to allow process to synchronize themselves on the entrie file or on portions of it(see cahpter12)

To create a new file, the process also may invoke the create() system call, which is handle by the kernel exactly like open().

ACCESSING AN OPENED FILE

Regular Unix files can be addressed either sequentially or randomly, while device files and named pipes are usually accessed sequentially.

In both kinds of access, the kernel stores the file pointer in the open file object that is, the current position at which the next read or write operation will take place.

Sequential access is implicity assumed: the read() and write() system calls always refer to the position of the current file pointer.

To modify the value, a program must explicity invoke the lseek() system call. When a file is opened, the kernel sets the file pointer to the

position of the first byte in the file (offset 0).

The lseek() system call requies the following parameters:

newoffset = lseek(fd, offset, whence);

-----------------------------------------------------------------------------------------------------------------------------------------------------------------

VIRTUAL MEMORY

All recent Unix system provide a useful abstraction called virtual memory.

Virtual memory acts as a logical layer between the application memory request and the hardware memory management unit.

Virtual memory has many purposes and advantages:

.Serveral process can be executed concurrently

.It is possible to run appliactions whose memory needs are larger than the available physical memory

.Process can execute a program whose code is only partially located in memory

.Each process is allowed to acces a subset of the available physical memory.

.Programmers can write machine-independent code, because they do not need to be concerned about physical memory organization

----------------------------------------------------------------------------------------

Segmetation in hardwar

Staring with the 80286 model, Intel microprocessor perform address translation in two different ways called real mode and protected mode.

We'll focus in the next section on address translation when protected mode is enable.

Real mode exists mostly to maintain processor compatibility with order models and to allow the operating system to bootstrap.

A logical address consists of two part: a segment identifier and an offset that specifies the relative address within the segment.

The segment identifier is a 16-bit field called the segment selector, while the offset is a 32-bit filed. we'll describe the field of segment

selector in the section "fast acces to segment descriptors" later in this chapter.

To make it easy to retrieve segment selector quickly, the processor provides segmentation registers whose only purpose is to hold segment selectors;

these register are called cs, as, ds, es, fs, and gs.

Although these are only six of them, a program can reuse the same segmentation register for different purpose by saving its content in memory and then restoring it later.