进程与程序深入解析-优快云博客

6.1 Processes and Programs

A program is a file containing a range of information that describes how to construct a process at run time.

This information includes the following:
? Binary format identification: Each program file includes metainformation describing the format of the executable file. This enables the kernel to interpret the remaining information in the file. Historically, two widely used formats for UNIX executable files were the original a.out (“assembler output”) format and the later, more sophisticated COFF (Common Object File Format). Nowadays, most UNIX implementations (including Linux) employ the Executable and Linking Format (ELF), which provides a number of advantages over the older formats.

? Machine-language instructions: These encode the algorithm of the program.
? Program entry-point address: This identifies the location of the instruction at which execution of the program should commence.
? Data: The program file contains values used to initialize variables and also literal constants used by the program (e.g., strings).
? Symbol and relocation tables: These describe the locations and names of functions and variables within the program. These tables are used for a variety of purposes, including debugging and run-time symbol resolution (dynamic linking).
? Shared-library and dynamic-linking information: The program file includes fields listing the shared libraries that the program needs to use at run time and the pathname of the dynamic linker that should be used to load these libraries.
? Other information: The program file contains various other information that describes how to construct a process.

----------------------------------------------------------------------------------------------------------------------------------

6.3 Memory Layout of a Process
The memory allocated to each process is composed of a number of parts, usually referred to as segments. These segments are as follows:
? The text segment contains the machine-language instructions of the program run by the process. The text segment is made read-only so that a process doesn’t accidentally modify its own instructions via a bad pointer value. Since many processes may be running the same program, the text segment is made sharable so that a single copy of the program code can be mapped into the virtual address space of all of the processes.
? The initialized data segment contains global and static variables that are explicitly initialized. The values of these variables are read from the executable file when the program is loaded into memory.
? The uninitialized data segment contains global and static variables that are not explicitly initialized. Before starting the program, the system initializes all memory in this segment to 0. For historical reasons, this is often called the bss segment, a name derived from an old assembler mnemonic for “block started by symbol.” The main reason for placing global and static variables that are initialized into a separate segment from those that are uninitialized is that, when a program is stored on disk, it is not necessary to allocate space for the uninitialized data. Instead, the executable merely needs to record the location and size required for the uninitialized data segment, and this space is allocated by the program loader at run time.
? The stack is a dynamically growing and shrinking segment containing stack frames. One stack frame is allocated for each currently called function. A frame stores the function’s local variables (so-called automatic variables), arguments, and return value. Stack frames are discussed in more detail in Section 6.5.
? The heap is an area from which memory (for variables) can be dynamically allocated at run time. The top end of the heap is called the program break.

-----------------------------------------------------------------------------------------------------------------

Virtual memory management separates the virtual address space of a process from the physical address space of RAM.

This provides many advantages:
? Processes are isolated from one another and from the kernel, so that one process can’t read or modify the memory of another process or the kernel. This is accomplished by having the page-table entries for each process point to distinct sets of physical pages in RAM (or in the swap area).

? Where appropriate, two or more processes can share memory. The kernel makes this possible by having page-table entries in different processes refer to the same pages of RAM. Memory sharing occurs in two common circumstances:
– Multiple processes executing the same program can share a single (readonly) copy of the program code. This type of sharing is performed implicitly when multiple programs execute the same program file (or load the same shared library).

– Processes can use the shmget() and mmap() system calls to explicitly request sharing of memory regions with other processes. This is done for the purpose of interprocess communication.

? The implementation of memory protection schemes is facilitated; that is, pagetable entries can be marked to indicate that the contents of the corresponding page are readable, writable, executable, or some combination of these protections.
Where multiple processes share pages of RAM, it is possible to specify that each process has different protections on the memory; for example, one process might have read-only access to a page, while another has read-write access.

? Programmers, and tools such as the compiler and linker, don’t need to be concerned with the physical layout of the program in RAM.

? Because only a part of a program needs to reside in memory, the program loads and runs faster. Furthermore, the memory footprint (i.e., virtual size) of a process can exceed the capacity of RAM. One final advantage of virtual memory management is that since each process uses less RAM, more processes can simultaneously be held in RAM. This typically leads to better CPU utilization, since it increases the likelihood that, at any moment in time, there is at least one process that the CPU can execute.

-------------------------------------------------------------------------------------------------------

Command-Line Arguments (argc, argv)

The first argument, int argc, indicates how many command-line arguments there are.

The second argument, char *argv[], is an array of pointers to the command-line arguments, each of which is a null-terminated
character string. The first of these strings, in argv[0], is (conventionally) the name of the program itself. The list of pointers in argv is terminated by a NULL pointer (i.e., argv[argc] is NULL).

------------------------------------------------------------------------------------------------------------------------------------

Summary
Each process has a unique process ID and maintains a record of its parent’s process ID.
The virtual memory of a process is logically divided into a number of segments: text, (initialized and uninitialized) data, stack, and heap.
The stack consists of a series of frames, with a new frame being added as a function is invoked and removed when the function returns. Each frame contains the local variables, function arguments, and call linkage information for a single function invocation.

The command-line arguments supplied when a program is invoked are made available via the argc and argv arguments to main(). By convention, argv[0] contains the name used to invoke the program.

Each process receives a copy of its parent’s environment list, a set of name-value pairs. The global variable environ and various library functions allow a process to access and modify the variables in its environment list.

The setjmp() and longjmp() functions provide a way to perform a nonlocal goto from one function to another (unwinding the stack). In order to avoid problems with compiler optimization, we may need to declare variables with the volatile modifier when making use of these functions. Nonlocal gotos can render a program difficult to read and maintain, and should be avoided whenever possible.

6 PROCESSES