Please indicate the source: http://blog.youkuaiyun.com/gaoxiangnumber1
Welcome to my github: https://github.com/gaoxiangnumber1
2.1 PROCESSES
- In any multiprogramming system, the CPU switches from process to process quickly, running each for tens or hundreds of milliseconds. At any one instant the CPU is running only one process, in the course of 1 second it may work on several of them, giving the illusion of parallelism. Sometimes people speak of pseudoparallelism in this context, to contrast it with the true hardware parallelism of multiprocessor systems which have two or more CPUs sharing the same physical memory.
2.1.1 The Process Model
- In the process model, all the runnable software on the computer, sometimes including the operating system, is organized into a number of processes. A process is an instance of an executing program, including the current values of the program counter, registers, and variables.
- Each process has its own virtual CPU. In reality, the real CPU switches back and forth from process to process. This rapid switching back and forth is called multiprogramming.
- In Fig.2-1(b) we see four processes, each with its own flow of control (i.e., its own logical program counter), and each one running independently of the other ones. There is only one physical program counter, so when each process runs, its logical program counter is loaded into the real program counter. When it is finished, the physical program counter is saved in the process’ stored logical program counter in memory.
Process and program: - Consider a scientist who is baking a birthday cake. He has a birthday cake recipe and a kitchen well stored with all the input: eggs and so on. The recipe is the program, that is, an algorithm expressed in some suitable notation, the computer scientist is the processor (CPU), and the cake ingredients are the input data. The process is the activity consisting of our baker reading the recipe, fetching the ingredients, and baking the cake.
- Now imagine that the computer scientist’s son comes running in screaming his head off, saying that he has been stung by a bee. The computer scientist records where he was in the recipe (the state of the current process is saved), gets out a first aid book, and begins following the directions in it. So the processor being switched from one process (baking) to a higher-priority process (administering medical care), each having a different program (recipe versus first aid book). When the bee sting has been taken care of, the computer scientist goes back to his cake, continuing at the point where he left off.
- A process is an activity that has a program, input, output, and a state. A single processor may be shared among several processes, with some scheduling algorithm being accustomed to determine when to stop work on one process and service a different one.
- A program is something that stores on disk, not doing anything.
- If a program is running twice, it counts as two processes. The two processes that run the same program are distinct processes. The operating system may be able to share the code between them so only one copy is in memory.
2.1.2 Process Creation
Four principal events cause processes to be created:
- System initialization.
When an operating system is booted, numerous processes are created. Some of these are foreground processes that interact with users and perform work for them. Others run in the background and are not associated with particular users, but have some specific function. Processes that stay in the background to handle some activity such as e-mail are called daemons. Large systems commonly have dozens of them. - Execution of a process-creation system call by a running process.
A running process will issue system calls to create one or more new processes to help it do its job. Creating new processes is useful when the work to be done can easily be formulated in terms of several related, but independent interacting processes. For example, if a large amount of data is being fetched over a network for subsequent processing, it may be convenient to create one process to fetch the data and put them in a shared buffer while a second process removes the data items and processes them. On a multiprocessor, allowing each process to run on a different CPU also make the job go faster. - A user request to create a new process.
In interactive systems, users can start a program by typing a command or (double) clicking on an icon. Taking either of these actions starts a new process and runs the selected program in it. Users may have multiple windows open at once, each running some process. Using the mouse, the user can select a window and interact with the process. - Initiation of a batch job.
Processes are created with the batch systems on large mainframes. Suppose inventory management at the end of a day at a chain of stores. Here users can submit batch jobs to the system. When the operating system decides that it has the resources to run another job, it creates a new process and runs the next job from the input queue in it.
More:
- In all these cases, a new process is created by having an existing process execute a process creation system call. This system call tells the operating system to create a new process and indicates which program to run in it.
- In UNIX, there is only one system call to create a new process: fork. This call creates an exact clone of the calling process. Usually, the child process then executes execve or a similar system call to change its memory image and run a new program. For example, when a user types a command “sort” to the shell, the shell forks off a child process and the child executes sort.
- In both UNIX and Windows systems, after a process is created, the parent and child have their own distinct address spaces. If either process changes a word in its address space, the change is not visible to the other process. In UNIX, the child’s initial address space is a copy of the parent’s, but there are two distinct address spaces involved; no writable memory is shared. Some UNIX implementations share the program text between the two since that cannot be modified.
- The child may share all of the parent’s memory, but in that case the memory is shared copy-on-write, means that whenever either of the two wants to modify part of the memory, that chunk of memory is explicitly copied first to make sure the modification occurs in a private memory area. No writable memory is shared. But it is possible for a newly created process to share some of its creator’s other resources, such as open files.
2.1.3 Process Termination
- After a process has been created, it starts running and does whatever its job is. Sooner or later the new process will terminate, usually due to one of the following conditions:
- Normal exit (voluntary自愿的).
- Error exit (voluntary).
- Process discovers a fatal error. For example, if a user types the command
gcc foo.c
to compile the program foo.c and no such file exists, the compiler simply announces this fact and exits.
- Process discovers a fatal error. For example, if a user types the command
- Fatal error (involuntary).
- An error caused by the process itself, often due to a program bug. Examples include executing an illegal instruction, referencing nonexistent memory, or dividing by zero. In some systems (e.g., UNIX), a process can tell the operating system that it wishes to handle certain errors itself, in which case the process is signaled (interrupted) instead of terminated when one of the errors occurs.
- Killed by another process (involuntary).
- The process executes a system call telling the operating system to kill some other process. The killer must have the necessary authorization to do in the killee.
2.1.4 Process Hierarchies
- In some systems, when a process creates another process, the parent process and child process continue to be associated in certain ways. The child process can itself create more processes, forming a process hierarchy. A process has only one parent but zero, one, two, or more children.
- In UNIX, a process and all of its children and further descendants together form a process group. When a user sends a signal from the keyboard, the signal is delivered to all members of the process group currently associated with the keyboard. Each process can catch/ignore the signal, or take the default action.
- How UNIX initializes itself when it is started after the computer is booted?
A special process, called init, is present in the boot image. When it starts running, it reads a file telling how many terminals there are. Then it forks off a new process per terminal. These processes wait for someone to log in. If a login is successful, the login process executes a shell to accept commands. So all the processes in the whole system belong to a single tree, with init at the root. - Windows has no concept of a process hierarchy. All processes are equal. The only hint of a process hierarchy is that when a process is created, the parent is given a special token (called a handle) that it can use to control the child. It is free to pass this token to some other process, thus invalidating the hierarchy. Processes in UNIX cannot disinherit their children.
2.1.5 Process States
- One process may generate some output that another process uses as input. In the shell command
cat chapter1 chapter2 chapter3 | grep tree
the first process, running cat, concatenates and outputs three files. The second process, running grep, selects all lines containing the word ‘‘tree.’’ Depending on the relative speeds of the two processes, it may happen that grep is ready to run, but there is no input waiting for it. It must then block until some input is available. - A process blocks when:
(1) it is waiting for input that is not available;
(2) the operating system has decided to allocate the CPU to another process for a while. - In the first case, the suspension is inherent in the problem.
In the second case, it is a matter of the system (not enough CPUs to give each process its own private processor).
- Three states a process may be in:
- Running (actually using the CPU at that instant).
- Ready (runnable; temporarily stopped to let another process run).
- Blocked (unable to run until some external event happens).
- Transition 1 occurs when the operating system discovers that a process cannot continue right now. In some systems the process can execute a system call (pause) to get into blocked state. In other systems (UNIX…) when a process reads from a pipe or special file (e.g., a terminal) and there is no input available, the process is automatically blocked.
- Transitions 2 and 3 are caused by the process scheduler which is a part of the operating system and the process don’t know about them. Transition 2 occurs when the scheduler decides that the running process has run long enough, and it is time to let another process have some CPU time. Transition 3 occurs when all the other processes have had their fair share and it is time for the first process to get the CPU to run again.
- Transition 4 occurs when the external event for which a process was waiting happens, such as the arrival of some input. If no other process is running at that instant, transition 3 will be triggered and the process will start running. Otherwise it may have to wait in ready state for a little while until the CPU is available and its turn comes.
- The lowest level of the operating system is the scheduler, with a variety of processes on top of it. All the interrupt handling and details of actually starting and stopping processes are hidden away in what is here called the scheduler.
2.1.6 Implementation of Processes
- To implement the process model, the operating system maintains a table (an array of structures), called the process table, with one entry per process. This entry contains important information about the process’ everything that must be saved when the process is switched from running to ready or blocked state so that it can be restarted later as if it had never been stopped.
- In (a) we see a four-step process for I/O.
Step 1: The driver tells the controller what to do by writing into its device registers. The controller then starts the device.
Step 2: When the controller has finished reading or writing the number of bytes it has been told to transfer, it signals the interrupt controller chip using certain bus lines.
Step 3: If the interrupt controller is ready to accept the interrupt (which it may not be if it is busy handling a higher-priority one), it asserts a pin on the CPU chip telling it. Step 4: The interrupt controller puts the number of the device on the bus so the CPU can read it and know which device has just finished (many devices may be running at the same time). - Once the CPU has decided to take the interrupt, the program counter and PSW are typically then pushed onto the current stack and the CPU switched into kernel mode. The device number may be used as an index into part of memory to find the address of the interrupt handler for this device. This part of memory is called the interrupt vector. Once the interrupt handler (part of the driver for the interrupting device) has started, it removes the stacked program counter and PSW(program status word) and saves them, then queries the device to learn its status. When the handler is all finished, it returns to the previously running user program to the first instruction that was not yet executed.
- Associated with each I/O class is a location (typically at a fixed location near the bottom of memory) called the interrupt vector. It contains the address of the interrupt service procedure.
- Suppose that user process 3 is running when a disk interrupt happens.
- User process 3’s program counter, program status word, and sometimes one or more registers are pushed onto the stack by the interrupt hardware.
- The computer then jumps to the address specified in the interrupt vector. From here on, it is up to the interrupt service procedure.
- All interrupts start by saving the registers, often in the process table entry for the current process.
- Then the information pushed onto the stack by the interrupt is removed and the stack pointer is set to point to a temporary stack used by the process handler.
- When this routine is finished, it calls a C procedure to do the rest of the work for this specific interrupt type.
- When it has done its job, possibly making some process now ready, the scheduler is called to see who to run next.
- After that, control is passed back to the assembly-language code to load up the registers and memory map for the now-current process.
- Assembly-language procedure start it running.
- The key idea is that after each interrupt the interrupted process returns to precisely the same state it was in before the interrupt occurred.
2.1.7 Modeling Multiprogramming
- Suppose that a process spends a fraction p of its time waiting for I/O to complete. With n processes in memory at once, the probability that all n processes are waiting for I/O (in which case the CPU will be idle) is pn . The CPU utilization is then given by the formula
CPU utilization = 1 − pn - Figure 2-6 shows the CPU utilization as a function of n, which is called the degree of multiprogramming.
2.2 THREADS
2.2.1 Thread Usage Reasons
- Only threads have the ability for the parallel entities to share an address space and all of its data among themselves. This ability is essential for certain applications, which is why having multiple processes will not work (because their address spaces are different).
- Threads are lighter weight than processes, they are easier and faster to create and destroy than processes.
- Threads yield no performance gain when all of them are CPU bound, but when there is substantial computing and also substantial I/O, having threads allows these activities to overlap, thus speeding up the application.
- Threads are useful on systems with multiple CPUs, where real parallelism is possible.
Word Processor Example:
- Consider what happens when the user suddenly deletes some sentences from page 1 of an 800-page book. Then he now wants to make another change on page 600 and types in a command telling the word processor to go to that page. The word processor is now forced to reformat the entire book up to page 600 on the spot because it does not know what the first line of page 600 will be until it has processed all the previous pages. There may be a substantial delay before page 600 can be displayed.
- Threads can help here. Suppose that the word processor is written as a two-threaded program. One thread interacts with the user and the other handles reformatting in the background. As soon as the sentence is deleted from page 1, the interactive thread tells the reformatting thread to reformat the whole book. Meanwhile, the interactive thread continues to listen to the keyboard and mouse and responds to simple commands while the other thread is computing in the background. The reformatting will be completed before the user asks to see page 600, so it can be displayed instantly.
- Many word processors can save the entire file to disk every few minutes to protect the user against losing a day’s work in the event of power failure. The third thread can handle the disk backups without interfering with the other two. The situation with three threads is shown in Fig. 2-7.
More
- By having three threads instead of three processes, they share a common memory and thus all have access to the document being edited. With three processes this would be impossible.
- At most Websites, some pages are more commonly accessed than other pages. Web servers improve performance by maintaining a collection of heavily used pages in main memory to eliminate the need to go to disk to get them.
- The dispatcher thread reads incoming requests for work from the network. After examining the request, it chooses an idle worker thread and hands it the request. The dispatcher then wakes up the sleeping worker, moving it from blocked state to ready state. When the worker wakes up, it checks to see if the request can be satisfied from the Web page cache. If not, it starts a read operation to get the page from the disk and blocks until the disk operation completes.
- When the thread blocks on the disk operation, another thread is chosen to run, possibly the dispatcher, in order to acquire more work, or possibly another worker that is now ready to run.
- This model allows the server to be written as a collection of sequential threads. The dispatcher’s program consists of an infinite loop for getting a work request and handing it off to a worker. Each worker’s code consists of an infinite loop consisting of accepting a request from the dispatcher and checking the Web cache to see if the page is present. If so, it is returned to the client, and the worker blocks waiting for a new request. If not, it gets the page from the disk, returns it to the client, and blocks waiting for a new request.
- buf and page are structures appropriate for holding a work request and a web page, respectively.
2.2.2 The Classical Thread Model
- Process is a way to group related resources together. A process has an address space containing program text and data, as well as other resources. By putting them together in the form of a process, they can be managed more easily.
- The thread has a program counter that keeps track of which instruction to execute next. It has registers, which hold its current working variables. It has a stack, which contains the execution history, with one frame for each procedure called but not yet returned from.
- What threads add to the process model is to allow multiple executions to take place in the same process environment. The threads share an address space and other resources. Because threads have some of the properties of processes, they are sometimes called lightweight processes. The term multithreading is used to describe the situation of allowing multiple threads in the same process.
- When a multithreaded process is run on a single-CPU system, the threads take turns running. The CPU switches rapidly back and forth among the threads, providing the illusion that the threads are running in parallel. With three compute-bound threads in a process, the threads would appear to be running in parallel, each one on a CPU with one-third the speed of the real CPU.
- A thread can be in any one of several states: running, blocked, ready, or terminated.
- A running thread currently has the CPU and is active.
- A blocked thread is waiting for some event to unblock it (waiting for some external event to happen or for some other thread to unblock it.). For example, when a thread performs a system call to read from the keyboard, it is blocked until input is typed.
- A ready thread is scheduled to run and will as soon as its turn comes up.
- Each thread has its own stack. Each thread’s stack contains one frame for each procedure called but not yet returned from. This frame contains the procedure’s local variables and the return address to use when the procedure call has finished. For example, if procedure X calls procedure Y and Y calls procedure Z, then while Z is executing, the frames for X, Y, and Z will all be on the stack. Each thread will generally call different procedures and thus have a different execution history. This is why each thread needs its own stack.
- When multithreading is present, processes usually start with a single thread present. This thread has the ability to create new threads by calling a library procedure such as thread_create. A parameter to thread_create specifies the name of a procedure for the new thread to run. Sometimes threads are hierarchical, with a parent-child relationship, but often no such relationship exists, with all threads being equal. The creating thread is usually returned a thread identifier that names the new thread.
- When a thread has finished its work, it can exit by calling a library procedure. It then disappear and is no longer schedulable. In some thread systems, one thread can wait for a thread to exit by calling a procedure, say, thread_join. This procedure blocks the calling thread until a specific thread has exited.
- Another common thread call is thread_yield, which allows a thread to voluntarily give up the CPU to let another thread run. It is important because there is no clock interrupt to enforce multiprogramming as there is with processes.
2.2.3 POSIX Threads
- To make it possible to write portable threaded programs, IEEE has defined a standard for threads. The threads package it defines is called Pthreads. Most UNIX systems support it.
- Each Pthreads thread has an identifier, a set of registers (program counter…), and a set of attributes, which are stored in a structure. The attributes include the stack size, scheduling parameters, and other items needed to use the thread.
- A new thread is created using the pthread_create call. The thread identifier of the newly created thread is returned as the function value.
- When a thread has finished the work it has been assigned, it can terminate by calling pthread_exit. This call stops the thread and releases its stack. Often a thread needs to wait for another thread to finish its work and exit before continuing. The thread that is waiting calls pthread_join to wait for a specific other thread to terminate. The thread identifier of the thread to wait for is given as a parameter.
- Sometimes it happens that a thread is not logically blocked, but feels that it has run long enough and wants to give another thread a chance to run. It can accomplish by calling pthread_yield.
- Pthread_attr_init creates the attribute structure associated with a thread and initializes it to the default values. These values can be changed by manipulating fields in the attribute structure.
- pthread_attr_destroy removes a thread’s attribute structure, freeing up its memory. It does not affect threads using it; they continue to exist.
- Here the main program loops NUMBER_OF_THREADS times, creating a new thread on each iteration, after announcing its intention. If the thread creation fails, it prints an error message and then exits. After creating all the threads, the main program exits.
2.2.4 Implementing Threads in User Space
- Put the threads package entirely in user space and the kernel knows nothing about them.
- The first advantage is that a user-level threads package can be implemented on an operating system that does not support threads. With this approach, threads are implemented by a library. All of these implementations have the same general structure, illustrated in Fig. 2-16(a). The threads run on top of a run-time system, which is a collection of procedures that manage threads.
- When threads are managed in user space, each process needs its own private thread table to keep track of the threads in that process. This table is analogous to the kernel’s process table, except that it keeps track only of the per-thread properties. The thread table is managed by the run-time system. When a thread is moved to ready state or blocked state, the information needed to restart it is stored in the thread table, exactly the same way as the kernel stores information about processes in the process table.
- When a thread does something that may cause it to become blocked locally, for example, waiting for another thread in its process to complete some work, it calls a run-time system procedure. This procedure checks to see if the thread must be put into blocked state. If so, it stores the thread’s registers in the thread table, looks in the table for a ready thread to run, and reloads the machine registers with the new thread’s saved values. As soon as the stack pointer and program counter have been switched, the new thread comes to life again automatically. Doing thread switching like this is faster than trapping to the kernel.
- When a thread is finished running for the moment, for example, when it calls thread_yield(release the CPU to let another thread run), the code of thread_yield can save the thread’s information in the thread table itself. It can then call the thread scheduler to pick another thread to run. The procedure that saves the thread’s state and the scheduler are just local procedures, so invoking them is more efficient than making a kernel call. Among other issues, no trap is needed, no context switch is needed, the memory cache need not be flushed, and so on. This makes thread scheduling very fast.
- User-level threads’ other advantages. They allow each process to have its own customized scheduling algorithm. For some applications, those with a garbage-collector thread, not having to worry about a thread being stopped at an inconvenient moment is a plus. They also scale better, since kernel threads invariably require some table space and stack space in the kernel, which can be a problem if there are a very large number of threads.
- User-level threads packages problems. First is the problem of how blocking system calls are implemented. Suppose that a thread reads from the keyboard before any keys have been hit. Letting the thread actually make the system call is unacceptable, since this will stop all the threads. One of the main goals of having threads was to allow each one to use blocking calls, but to prevent one blocked thread from affecting the others. With blocking system calls, it is hard to see how this goal can be achieved readily.
- The system calls could all be changed to be nonblocking (e.g., a read on the keyboard would just return 0 bytes if no characters were already buffered), but requiring changes to the operating system is unattractive. In addition, changing the semantics of read will require changes to many user programs.
- Another alternative is available in the event that it is possible to tell in advance if a call will block. In UNIX, a system call, select, which allows the caller to tell whether a future read will block. When this call is present, the library procedure read can be replaced with a new one that first does a select call and then does the read call only if it is safe (i.e., will not block). If the read call will block, the call is not made. Instead, another thread is run. The next time the run-time system gets control, it can check again to see if the read is now safe. This approach requires rewriting parts of the system call library, and is inefficient. The code placed around the system call to do the checking is called a jacket or wrapper.
- Page faults problem. Computers can be established in such a way that not all of the program is in main memory at once. If the program calls or jumps to an instruction that is not in memory, a page fault occurs and the operating system will go and get the missing instruction from disk. This is called a page fault. The process is blocked while the necessary instruction is being located and read in. If a thread causes a page fault, then the kernel will unaware of the existence of threads, naturally blocks the entire process until the disk I/O is complete, even though other threads might be runnable.
- Another problem with user-level thread packages is that if a thread starts running, no other thread in that process will ever run unless the first thread voluntarily gives up the CPU. Within a single process, there are no clock interrupts, making it impossible to schedule processes taking turns. Unless a thread enters the run-time system of its own free will, the scheduler will never get a chance.
- The most argument against user-level threads is that programmers generally want threads precisely in applications where the threads block often, for example, in a multithreaded Web server. These threads are often making system calls. Once a trap has occurred to the kernel to carry out the system call, it is hardly any more work for the kernel to switch threads if the old one has blocked.
2.2.5 Implementing Threads in the Kernel
- The kernel has a thread table that keeps track of all the threads in the system. When a thread wants to create a new thread or destroy an existing thread, it makes a kernel call, which then does the creation or destruction by updating the kernel thread table.
- The kernel’s thread table holds each thread’s information that is the same as with user-level threads, but now kept in the kernel instead of in user space (inside the run-time system). In addition, the kernel also maintains the traditional process table to keep track of processes.
- All calls that might block a thread are implemented as system calls, at greater cost than a call to a run-time system procedure. When a thread blocks, the kernel can run either another thread from the same process (if one is ready) or a thread from a different process. With user-level threads, the run-time system keeps running threads from its own process until the kernel takes the CPU away from it (or there are no ready threads left to run).
- Due to the relatively greater cost of creating and destroying threads in the kernel, some systems take an environmentally correct approach and recycle their threads. When a thread is destroyed, it is marked as not runnable, but its kernel data structures are not affected. Later, when a new thread must be created, an old thread is reactivated, saving some overhead.
- Kernel threads do not require any new, nonblocking system calls. In addition, if one thread in a process causes a page fault, the kernel can easily check to see if the process has any other runnable threads, and if so, run one of them while waiting for the required page to be brought in from the disk.
- Their main disadvantage is that the cost of a system call is substantial, so if thread operations are common, much more overhead will be incurred.
2.2.6 Hybrid Implementations
- A better way is use kernel-level threads and then multiplex user-level threads onto some or all of them, as shown in Fig. 2-17.
- When this approach is used, the programmer can determine how many kernel threads to use and how many user-level threads to multiplex on each one.
- With this approach, the kernel is aware of only the kernel-level threads and schedules those. Some of those threads may have multiple user-level threads multiplexed on top of them. These user-level threads are created, destroyed, and scheduled just like user-level threads in a process that runs on an operating system without multithreading capability. In this model, each kernel-level thread has some set of user-level threads that take turns using it.
2.2.7 Scheduler Activations
- The goals of the scheduler activation work are to mimic the function of kernel threads, but with the better performance and greater flexibility usually associated with threads packages implemented in user space.
- Efficiency is achieved by avoiding unnecessary transitions between user and kernel space. If a thread blocks waiting for another thread to do something, there is no reason to involve the kernel, thus saving the overhead of the kernel-user transition. The user-space run-time system can block the synchronizing thread and schedule a new one by itself.
- When scheduler activations are used, the kernel assigns a certain number of virtual processors to each process and lets the user-space run-time system allocate threads to processors. The number of virtual processors allocated to a process is initially one, but the process can ask for more and can also return processors it no longer needs. The kernel can also take back virtual processors already allocated in order to assign them to more needy processes.
- The basic idea that makes this scheme work is that when the kernel knows that a thread has blocked (e.g., by its having executed a blocking system call or caused a page fault), the kernel notifies the process’ run-time system, passing as parameters on the stack the number of the thread in question and a description of the event that occurred. The notification happens by having the kernel activate the run-time system at a known starting address. This mechanism is called an upcall.
- Once activated, the run-time system can reschedule its threads, typically by marking the current thread as blocked and taking another thread from the ready list, setting up its registers, and restarting it. Later, when the kernel learns that the original thread can run again (e.g., the pipe it was trying to read from now contains data, or the page it faulted over has been brought in from disk), it makes another upcall to the run-time system to inform it. The run-time system can either restart the blocked thread immediately or put it on the ready list to be run later.
- When a hardware interrupt occurs while a user thread is running, the interrupted CPU switches into kernel mode.
—If the interrupt is caused by an event not of interest to the interrupted process, such as completion of another process’ I/O, when the interrupt handler has finished, it puts the interrupted thread back in the state it was in before the interrupt.
—If the process is interested in the interrupt, such as the arrival of a page needed by one of the process’ threads, the interrupted thread is not restarted. Instead, it is suspended, and the run-time system is started on that virtual CPU, with the state of the interrupted thread on the stack. It is then up to the run-time system to decide which thread to schedule on that CPU: the interrupted one, the newly ready one, or some third choice. - An objection to scheduler activations is the fundamental reliance on upcalls, a concept that violates the structure inherent in any layered system. Normally, layer n offers certain services that layer n + 1 can call on, but layer n may not call procedures in layer n + 1. Upcalls do not follow this fundamental principle.
2.2.8 Pop-Up Threads
- Threads are useful in distributed systems. An example is how incoming messages are handled. The traditional approach is to have a process or thread that is blocked on a receive system call waiting for an incoming message. When a message arrives, it accepts the message, unpacks it, examines the contents, and processes it.
- A different approach is that the arrival of a message causes the system to create a new thread to handle the message. Such a thread is called a pop-up thread and is illustrated in Fig. 2-18.
- A key advantage of pop-up threads is that since they are brand new, they do not have any history that must be restored. Each one starts out fresh and each one is identical to all the others. This makes it possible to create such a thread quickly. The new thread is given the incoming message to process. The result of using pop-up threads is that the latency between message arrival and the start of processing can be made very short.
2.2.9 Making Single-Threaded Code Multithreaded
- The code of a thread normally consists of multiple procedures. These may have local variables, global variables, and parameters. Local variables and parameters do not cause trouble, but variables that are global to a thread but not global to the entire program are a problem. These are variables that are global in the sense that many procedures within the thread use them, but other threads should logically leave them alone.
- Consider the errno variable maintained by UNIX. When a process (or a thread) makes a system call that fails, the error code is put into errno. Thread 1 executes the system call access to find out if it has permission to access a certain file. The operating system returns the answer in the global variable errno. After control has returned to thread 1, but before it has a chance to read errno, the scheduler decides that thread 1 has had enough CPU time for the moment and decides to switch to thread 2. Thread 2 executes an open call that fails, which causes errno to be overwritten and thread 1’s access code to be lost forever. When thread 1 starts up later, it will read the wrong value and behave incorrectly.
- Solution is to assign each thread its own private global variables. In this way, each thread has its own private copy of errno and other global variables, so conflicts are avoided. In effect, this decision creates a new scoping level, variables visible to all the procedures of a thread but not to other threads.
- The next problem in turning a single-threaded program into a multithreaded one is that many library procedures are not reentrant. They were not designed to have a second call made to any given procedure while a previous call has not yet finished.
2.3 INTERPROCESS COMMUNICATION
- Processes need to communicate with other processes. For example, in a shell pipeline, the output of the first process must be passed to the second process, and so on down the line. Thus there is a need for communication between processes. We will look at InterProcess Communication, or IPC.
2.3.1 Race Conditions
- Consider a print spooler. When a process wants to print a file, it enters the file name in a special spooler directory. Another process, the printer daemon, periodically checks to see if there are any files to be printed, and if there are, it prints them and then removes their names from the directory.
- Imagine that our spooler directory has a very large number of slots, numbered 0, 1, 2, …, each one capable of holding a file name. Also imagine that there are two shared variables, out, which points to the next file to be printed, and in, which points to the next free slot in the directory.
- At a certain instant, slots 0 to 3 are empty (the files have already been printed) and slots 4 to 6 are full (with the names of files queued for printing). Simultaneously, processes A and B decide they want to queue a file for printing. This situation is shown in Fig. 2-21.
- Process A reads in and stores the value, 7, in a local variable called next_free_slot. Just then a clock interrupt occurs and the CPU switches to process B. Process B also reads in and also gets a 7. It also stores it in its local variable next_free_slot. At this instant both processes think that the next available slot is 7.
- Process B now continues to run. It stores the name of its file in slot 7 and updates in to be an 8. Then it goes off and does other things.
- Process A runs again, starting from the place it left off. It looks at next_free_slot, finds a 7 there, and writes its file name in slot 7, erasing the name that process B just put there. Then it computes next_free_slot + 1, which is 8, and sets in to 8.
- The spooler directory is now internally consistent, so the printer daemon will not notice anything wrong, but process B will never receive any output.
- Situations like this, where two or more processes are reading or writing some shared data and the final result depends on who runs precisely when, are called race conditions.
2.3.2 Critical Regions
- The key to preventing trouble is to find some way to prohibit more than one process from reading and writing the shared data at the same time. What we need is mutual exclusion, some way of making sure that if one process is using a shared variable or file, the other processes will be excluded from doing the same thing.
- Sometimes a process has to access shared memory or files, or do other critical things that can lead to races. That part of the program where the shared memory is accessed is called the critical region or critical section.
- If we could arrange matters such that no two processes were ever in their critical regions at the same time, we could avoid races. But it is not sufficient for having parallel processes cooperate correctly and efficiently using shared data.
- We need four conditions to hold to have a good solution:
- No two processes may be simultaneously inside their critical regions.
- No assumptions may be made about speeds or the number of CPUs.
- No process running outside its critical region may block any process.
- No process should have to wait forever to enter its critical region.
- Here process A enters its critical region at time T1. A little later, at time T2 process B attempts to enter its critical region but fails because another process is already in its critical region and we allow only one at a time. Consequently, B is temporarily suspended until time T3 when A leaves its critical region, allowing B to enter immediately. Eventually B leaves (at T4 ) and we are back to the original situation with no processes in their critical regions.
2.3.3 Mutual Exclusion with Busy Waiting
Disabling Interrupts
- On a single-processor system, the simplest solution is to have each process disable all interrupts just after entering its critical region and re-enable them just before leaving it. With interrupts disabled, no clock interrupts can occur. The CPU is only switched from process to process as a result of clock or other interrupts, and with interrupts turned off the CPU will not be switched to another process. So once a process has disabled interrupts, it can examine and update the shared memory without fear that any other process will disturb.
- But this is impossible in today. In a multicore disabling the interrupts of one CPU does not prevent other CPUs from interfering with operations the first CPU is performing.
Lock Variables
- Consider having a single, shared (lock) variable, initially 0. When a process wants to enter its critical region, it first tests the lock.
—If the lock is 0, the process sets it to 1 and enters the critical region.
—If the lock is 1, the process waits until it becomes 0.
Thus, a 0 means that no process is in its critical region, and a 1 means that some process is in its critical region. - Suppose that one process reads the lock and sees that it is 0. Before it can set the lock to 1, another process is scheduled, runs, and sets the lock to 1. When the first process runs again, it will also set the lock to 1, and two processes will be in their critical regions at the same time.
- You might think that we could solve this problem by first reading out the lock value, then checking it again just before storing into it, but that really does not help. The race now occurs if the second process modifies the lock just after the first process has finished its second check.
Strict Alternation
- In Fig. 2-23, the integer variable turn, initially 0, keeps track of whose turn it is to enter the critical region and examine or update the shared memory. Initially, process 0 inspects turn, finds it to be 0, and enters its critical region. Process 1 also finds it to be 0 and therefore sits in a loop continually testing turn to see when it becomes 1.
- Continuously testing a variable until some value appears is called busy waiting. Only when there is a reasonable expectation that the wait will be short is busy waiting used. A lock that uses busy waiting is called a spin lock.
- When process 0 leaves the critical region, it sets turn to 1, to allow process 1 to enter its critical region. Suppose that process 1 finishes its critical region quickly, so that both processes are in their noncritical regions, with turn set to 0. Now process 0 executes its whole loop quickly, exiting its critical region and setting turn to 1. At this point turn is 1 and both processes are executing in their noncritical regions.
- Suddenly, process 0 finishes its noncritical region and goes back to the top of its loop. Unfortunately, it is not permitted to enter its critical region now, because turn is 1 and process 1 is busy with its noncritical region. It hangs in its while loop until process 1 sets turn to 0. Taking turns is not a good idea when one of the processes is much slower than the other.
- This situation violates condition 3 (No process running outside its critical region may block any process) set out above: process 0 is being blocked by a process not in its critical region. Going back to the spooler directory discussed above, if we now associate the critical region with reading and writing the spooler directory, process 0 would not be allowed to print another file because process 1 was doing something else.
- This solution requires that the two processes strictly alternate in entering their critical regions. Neither one would be permitted to spool two in a row.
Peterson’s Solution
- Before using the shared variables (i.e., before entering its critical region), each process calls enter_region with its own process number, 0 or 1, as parameter. This call will cause it to wait until it is safe to enter(i.e., the enter_region function returns). After it has finished with the shared variables, the process calls leave_region to indicate that it is done and to allow the other process to enter.
- Initially neither process is in its critical region. Now process 0 calls enter_region. It indicates its interest by setting its array element and sets turn to 0. Since process 1 is not interested, enter_region returns immediately. If process 1 now makes a call to enter_region, it will hang there until interested[0] goes to FALSE, an event that happens only when process 0 calls leave_region to exit the critical region.
- Now consider the case that both processes call enter_region almost simultaneously. Both will store their process number in turn. Whichever store is done last is the one that counts; the first one is overwritten and lost. Suppose that process 1 stores last, so turn is 1. When both processes come to the while statement, process 0 executes it zero times and enters its critical region. Process 1 loops and does not enter its critical region until process 0 exits its critical region.
The TSL Instruction
- Some computers have an instruction like
TSL RX, LOCK
(Test and Set Lock) that works as follows:
It reads the contents of the memory word lock into register RX and then stores a nonzero value at the memory address lock. The operations of reading the word and storing into it are guaranteed to be indivisible because no other processor can access the memory word until the instruction is finished. The CPU executing the TSL instruction locks the memory bus to prohibit other CPUs from accessing memory until it is done. - Lock the memory bus is different from disabling interrupts. Disabling interrupts then performing a read on a memory word followed by a write does not prevent a second processor on the bus from accessing the word between the read and the write. In fact, disabling interrupts on processor 1 has no effect on processor 2. The only way to keep processor 2 out of the memory until processor 1 is finished is to lock the bus, which requires a special hardware facility.
- When lock is 0, any process may set it to 1 using the TSL instruction and then read or write the shared memory. When it is done, the process sets lock back to 0 using an ordinary move instruction.
- The first instruction copies the old value of lock to the register and then sets lock to 1. Then the old value is compared with 0. If it is nonzero, the lock was already set, so the program goes back to the beginning and tests it again. Later it will become 0 when the process currently in its critical region is done with its critical region, and the subroutine returns, with the lock set. The program stores a 0 in lock to clear the lock by calling leave_region.
- Solution to the critical-region problem:
Before entering its critical region, a process calls enter_region, which does busy waiting until the lock is free; then it acquires the lock and returns. After leaving the critical region the process calls leave_region, which stores a 0 in lock. - An alternative instruction to TSL is XCHG, which exchanges the contents of two locations atomically, for example, a register and a memory word. All Intel x86 CPUs use XCHG instruction for low-level synchronization.
2.3.4 Sleep and Wakeup
- Both Peterson’s solution and the TSL solution have the defect of requiring busy waiting. In essence, what these solutions do is this: when a process wants to enter its critical region, it checks to see if the entry is allowed. If it is not, the process just sits in a loop waiting until it is. Not only does this approach waste CPU time, but it can also have unexpected effects.
- Consider a computer with two processes, H, with high priority, and L, with low priority. The scheduling rules are such that H is run whenever it is in ready state. At a certain moment, with L in its critical region, H becomes ready to run. H now begins busy waiting, but since L is never scheduled while H is running, L never gets the chance to leave its critical region, so H loops forever. This situation is referred to as the priority inversion problem.
- Sleep is a system call that causes the caller to block, that is, be suspended until another process wakes it up. Both sleep and wakeup each have one parameter, a memory address used to match up sleeps with wakeups.
The Producer-Consumer Problem - Two processes share a common, fixed-size buffer. One of them, the producer, puts information into the buffer, and the other one, the consumer, takes it out.
- If the producer wants to put a new item in the buffer, but it is already full. Then the producer goes to sleep, to be awakened when the consumer has removed one or more items.
If the consumer wants to remove an item from the buffer and sees that the buffer is empty, it goes to sleep until the producer puts something in the buffer and wakes it up. - To keep track of the number of items in the buffer, we will need a variable, count. If the maximum number of items the buffer can hold is N, the producer’s code will first test to see if count is N. If it is, the producer will go to sleep; if it is not, the producer will add an item and increment count.
The consumer’s code is similar: first test count to see if it is 0. If it is, go to sleep; if it is nonzero, remove an item and decrement the counter. - Each of the processes also tests to see if the other should be awakened, and if so, wakes it up.
- Race condition can occur because access to count is unconstrained. Suppose the buffer is empty and the consumer has just read count to see if it is 0. At that instant, the scheduler decides to stop running the consumer temporarily and start running the producer. The producer inserts an item in the buffer, increments count, and notices that it is now 1. Reasoning that count was just 0, and thus the consumer must be sleeping, the producer calls wakeup to wake the consumer up.
- Unfortunately, the consumer is not sleep, so the wakeup signal is lost. When the consumer next runs, it will test the value of count it previously read, find it to be 0, and go to sleep. Sooner or later the producer will fill up the buffer and also go to sleep. Both will sleep forever.
- The essence of the problem is that a wakeup sent to a process that is not sleep is lost. If it were not lost, everything would work.
- Solution is to add a wakeup waiting bit to the routine. When a wakeup is sent to a process that is still awake, this bit is set. Later, when the process tries to go to sleep, if the wakeup waiting bit is on, the process will stay awake and the wakeup waiting bit will be turned off.
2.3.5 Semaphores
- A semaphore is the number of stored wakeups. 0 indicates that no wakeups were saved. Two operations on semaphores: down and up (corresponding to sleep and wakeup, respectively).
1.down(sleep):
if(semaphore > 0)
{
semaphore = semaphore - 1;
continue;
}
else if(semaphore == 0)
{
sleep(); // without completing the down for the moment
}
2.up(wakeup):
semaphore = semaphore + 1;
if(exist process sleeping on this semaphore)
{
System choose process P of sleeping processes;
// P is allowed to complete its down
P.semaphore = P.semaphore - 1;
// equal to: semaphore = semaphore - 1;
}
- The down operation on a semaphore checks to see if the value is greater than 0.
—If so, it decrements the value (i.e., uses up one stored wakeup) and just continues.
—If the value is 0, the process is put to sleep without completing the down for the moment. - Checking the value, changing it, and going to sleep, are all done as a single, indivisible atomic action. It is guaranteed that once a semaphore operation has started, no other process can access the semaphore until the operation has completed or blocked. This atomicity is essential to solving synchronization problems and avoiding race conditions. Atomic actions, in which a group of related operations are either all performed without interruption or not performed at all.
- The up operation increments the value of the semaphore addressed, then if one or more processes were sleeping on that semaphore, unable to complete an earlier down operation, one of them is chosen by the system and is allowed to complete its down. Thus, after an up on a semaphore with processes sleeping on it, the semaphore will still be 0, but there will be one fewer process sleeping on it. The operation of incrementing the semaphore and waking up one process is also indivisible.
Solving the Producer-Consumer Problem Using Semaphores
empty: producer’s semaphore
full: consumer’s semaphore
- To make them work correctly, it is essential that they are implemented in an indivisible way. The normal way is to implement up and down as system calls, with the operating system disabling all interrupts while it is testing the semaphore, updating it, and putting the process to sleep, if necessary. As all of these actions take only a few instructions, no harm is done in disabling interrupts. If multiple CPUs are being used, each semaphore should be protected by a lock variable, with the TSL or XCHG instructions used to make sure that only one CPU at a time examines the semaphore.
- This solution uses three semaphores: full (number of slots that are full), empty (number of slots that are empty), mutex (make sure the producer and consumer do not access the buffer at the same time). Full is initially 0, empty is initially equal to the number of slots in the buffer, and mutex is initially 1.
- Semaphores that are initialized to 1 and used by two or more processes to ensure that only one of them can enter its critical region at the same time are called binary semaphores. If each process does a down just before entering its critical region and an up just after leaving it, mutual exclusion is guaranteed.
- In a system using semaphores, the natural way to hide interrupts is to have a semaphore, initially set to 0, associated with each I/O device. Just after starting an I/O device, the managing process does a down on the associated semaphore, thus blocking immediately. When the interrupt comes in, the interrupt handler then does an up on the associated semaphore, which makes the relevant process ready to run again.
- In this model, step 5 in Fig. 2-5 consists of doing an up on the device’s semaphore, so that in step 6 the scheduler will be able to run the device manager. If several processes are now ready, the scheduler may choose to run an even more important process next.
- In the example of Fig. 2-28, we have actually used semaphores in two different ways.
- The mutex semaphore is used for mutual exclusion. It is designed to guarantee that only one process at a time will be reading or writing the buffer and the associated variables. This mutual exclusion is required to prevent chaos.
- The other use of semaphores is for synchronization. The full and empty semaphores are needed to guarantee that certain event sequences do or do not occur. In this case, they ensure that the producer stops running when the buffer is full, and that the consumer stops running when it is empty.
2.3.6 Mutexes
- A mutex is a shared variable that can be in one of two states: unlocked or locked. In practice an integer often is used, with 0 meaning unlocked and all other values meaning locked.
- Mutexes are good only for managing mutual exclusion to some shared resource or piece of code. They are easy and efficient to implement, which makes them useful in thread packages that are implemented entirely in user space.
- Two procedures are used with mutexes. When a thread (or process) needs access to a critical region, it calls mutex_lock.
—If the mutex is unlocked, the call succeeds and the calling thread is free to enter the critical region.
—If the mutex is locked, the calling thread is blocked until the thread in the critical region is finished and calls mutex_unlock. - If multiple threads are blocked on the mutex, one of them is chosen at random and allowed to acquire the lock.
- Mutexes can be implemented in user space provided that a TSL or XCHG instruction is available. The code for mutex_lock and mutex_unlock for use with a user-level threads package are shown in Fig. 2-29.
- The code of mutex_lock is similar to the code of enter_region but with a difference. When enter_region fails to enter the critical region, it keeps testing the lock repeatedly(busy waiting). Eventually, the clock runs out and some other process is scheduled to run. Sooner or later the process holding the lock gets to run and releases it.
- With threads, the situation is different because there is no clock that stops threads that have run too long. Consequently, a thread that tries to acquire a lock by busy waiting will loop forever and never acquire the lock because it never allows any other thread to run and release the lock.
- That is where the difference between enter_region and mutex_lock comes in. When the later fails to acquire a lock, it calls thread_yield to give up the CPU to another thread. Consequently there is no busy waiting. When the thread runs the next time, it tests the lock again.
- With a user-space threads package there is no problem with multiple threads having access to the same mutex, since all the threads operate in a common address space. However, with the earlier solutions, such as Peterson’s algorithm and semaphores, there is an assumption that multiple processes have access to at least some shared memory. If processes have disjoint address spaces, how can they share the turn variable in Peterson’s algorithm, or semaphores or a common buffer?
- There are two answers.
- Some of the shared data structures, such as the semaphores, can be stored in the kernel and accessed only by means of system calls.
- Most modern operating systems offer a way for processes to share some portion of their address space with other processes. So buffers and other data structures can be shared.
In the worst case, that nothing else is possible, a shared file can be used.
- If two or more processes share most or all of their address spaces, the distinction between processes and threads becomes blurred but is nevertheless present. Two processes that share a common address space still have different open files, alarm timers, and other per-process properties, whereas the threads within a single process share them. And it is always true that multiple processes sharing a common address space never have the efficiency of user-level threads since the kernel is deeply involved in their management.
Futexes: fast user space mutex
- With increasing parallelism, efficient synchronization and locking is very important for performance. Spin locks are fast if the wait is short, but waste CPU cycles if not. If there is much contention, it is more efficient to block the process and let the kernel unblock it only when the lock is free. But this has the inverse problem: it works well under heavy contention, but continuously switching to the kernel is expensive if there is very little contention to begin with. Worse still, it may not be easy to predict the amount of lock contention.
- A futex is a feature of Linux that implements basic locking (like a mutex) but avoids dropping into the kernel unless it really has to.
- A futex consists of two parts: a kernel service and a user library.
The kernel service provides a ‘‘wait queue’’ that allows multiple processes to wait on a lock. They will not run, unless the kernel explicitly unblocks them. For a process to be put on the wait queue requires a (expensive) system call and should be avoided. In the absence of contention, the futex works completely in user space. The processes share a common lock variable—an imagine name for an aligned 32-bit integer that serves as the lock. - Suppose the lock is initially 1, which we assume to mean that the lock is free. A thread grabs the lock by performing an atomic ‘‘decrement and test’’. Next, the thread inspects the result to see whether or not the lock was free.
—If it was not in the locked state, our thread has successfully grabbed the lock.
—If the lock is held by another thread, our thread has to wait. In this case, the futex library does not spin, but uses a system call to put the thread on the wait queue in the kernel. The cost of the switch to the kernel is justified, because the thread was blocked anyway. - When a thread is done with the lock, it releases the lock with an atomic ‘‘increment and test’’ and checks the result to see if any processes are still blocked on the kernel wait queue.
—If so, it will let the kernel know that it may unblock one or more of these processes. —If there is no contention, the kernel is not involved at all.
Mutexes in Pthreads
- Pthreads provides a number of functions that can be used to synchronize threads. The basic mechanism uses a mutex variable, which can be locked or unlocked, to guard each critical region.
- A thread wishing to enter a critical region first tries to lock the associated mutex.
—If the mutex is unlocked, the thread can enter immediately and the lock is atomically set, preventing other threads from entering.
—If the mutex is locked, the calling thread is blocked until it is unlocked. If multiple threads are waiting on the same mutex, when it is unlocked, only one of them is allowed to continue and relock it. - These locks are not mandatory. It is up to the programmer to make sure threads use them correctly.
- Mutexes can be created and destroyed: pthread_mutex_init and pthread_mutex_destroy, respectively.
- They can be locked by pthread_mutex_lock which tries to acquire the lock and blocks if is already locked.
- There is also an option for trying to lock a mutex and failing with an error code instead of blocking if it is already blocked: pthread_mutex_trylock. This call allows a thread to effectively do busy waiting if that is ever needed.
- pthread_mutex_unlock unlocks a mutex and releases exactly one thread if one or more are waiting on it.
- Pthreads offers a second synchronization mechanism: condition variables. Mutexes are good for allowing or blocking access to a critical region. Condition variables allow threads to block due to some condition not being met.
- Consider the producer-consumer scenario again: one thread puts things in a buffer and another one takes them out. If the producer discovers that there are no more free slots available in the buffer, it has to block until one becomes available. Mutexes make it possible to do the check atomically without interference from other threads, but having discovered that the buffer is full, the producer needs a way to block and be awakened later. This is what condition variables allow.
- The primary operations on condition variables are pthread_cond_wait and pthread_cond_signal. The former blocks the calling thread until some other thread signals it (using the latter call). The blocking thread often is waiting for the signaling thread to do some work, release some resource, or perform some other activity. Only then can the blocking thread continue. The condition variables allow this waiting and blocking to be done atomically.
- Condition variables and mutexes are always used together. The pattern is for one thread to lock a mutex, then wait on a conditional variable when it cannot get what it needs. Eventually another thread will signal it and it can continue. The pthread_cond_wait call atomically unlocks the mutex it is holding. For this reason, the mutex is one of the parameters.
- It is also worth noting that condition variables (unlike semaphores) have no memory. If a signal is sent to a condition variable on which no thread is waiting, the signal is lost. Programmers have to be careful not to lose signals.
2.3.7 Monitors
- Fig. 2-28: Suppose that the two downs in the producer’s code were reversed in order, so mutex was decremented before empty instead of after it. If the buffer were completely full, the producer would block, with mutex set to 0. Consequently, the next time the consumer tried to access the buffer, it would do a down on mutex, now 0, and block too. Both processes would stay blocked forever and no more work would ever be done. This unfortunate situation is called a deadlock.
- To make it easier to write correct programs, scientist proposed a higher-level synchronization primitive called a monitor. A monitor is a collection of procedures, variables, and data structures that are all grouped together in a special kind of module or package. Processes may call the procedures in a monitor whenever they want to, but they cannot directly access the monitor’s internal data structures from procedures declared outside the monitor.
- Monitors have an important property that makes them useful for achieving mutual exclusion: only one process can be active in a monitor at any instant.
- Monitors are a programming-language construct, so the compiler knows they are special and can handle calls to monitor procedures differently from other procedure calls. Typically, when a process calls a monitor procedure, the first few instructions of the procedure will check to see if any other process is currently active within the monitor. If so, the calling process will be suspended until the other process has left the monitor. If no other process is using the monitor, the calling process may enter.
- It is up to the compiler to implement mutual exclusion on monitor entries, and a common way is to use a mutex or a binary semaphore. In any event, the person writing the monitor does not have to be aware of how the compiler arranges for mutual exclusion. It is sufficient to know that by turning all the critical regions into monitor procedures, no two processes will ever execute their critical regions at the same time.
- Although monitors provide an easy way to achieve mutual exclusion, we need a way for processes to block when they cannot proceed. In the producer-consumer problem, it is easy to put all the tests for buffer-full and buffer-empty in monitor procedures, but how should the producer block when it finds the buffer full?
- The solution is condition variables, along with two operations(wait and signal) on them.
- When a monitor procedure discovers that it cannot continue (e.g., the producer finds the buffer full), it does a wait on some condition variable, say, full. This action causes the calling process to block and allows another process that had been previously prohibited from entering the monitor to enter now. This other process, for example, the consumer, can wake up its sleeping partner by doing a signal on the condition variable that its partner is waiting on.
- To avoid having two active processes in the monitor at the same time, we need a rule telling what happens after a signal.
- The first choice is letting the newly awakened process run, suspending the other one.
- The second is requiring that the process doing a signal must exit the monitor immediately, that is, a signal statement may appear only as the final statement in a monitor procedure.
- The third is to let the signaler continue to run and allow the waiting process to start running only after the signaler has exited the monitor.
- We will use the second choice because it is conceptually simpler and is also easier to implement.
- If a signal is done on a condition variable on which several processes are waiting, only one of them, determined by the system scheduler, is revived.
- Condition variables do not accumulate signals for later use the way semaphores do. Thus, if a condition variable is signaled with no one waiting on it, the signal is lost forever. In other words, the wait must come before the signal.
- In practice, it is not a problem because it is easy to keep track of the state of each process with variables, if need be. A process that might otherwise do a signal can see that this operation is not necessary by looking at the variables.
- Difference between wait/signal and sleep/wakeup: sleep/wakeup failed because while one process was trying to go to sleep, the other one was trying to wake it up. This cannot happen with monitors. The automatic mutual exclusion on monitor procedures guarantees that if, say, the producer inside a monitor procedure discovers that the buffer is full, it will be able to complete the wait operation without having to worry about the possibility that the scheduler may switch to the consumer just before the wait completes. The consumer will not even be let into the monitor at all until the wait is finished and the producer has been marked as no longer runnable.
- A solution to the producer-consumer problem using monitors in Java is given in Fig. 2-35. Our solution has four classes. The outer class, ProducerConsumer, creates and starts two threads, p and c. The second and third classes, producer and consumer, respectively, contain the code for the producer and consumer. Finally, the class our monitor, is the monitor. It contains two synchronized threads that are used for actually inserting items into the shared buffer and taking them out.
- The producer and consumer threads are functionally identical to their counterparts in all our previous examples. The producer has an infinite loop generating data and putting it into the common buffer. The consumer has an equally infinite loop taking data out of the common buffer and doing some fun thing with it. The interesting part of this program is the class our monitor, which holds the buffer, the administration variables, and two synchronized methods. When the producer is active inside insert, it knows for sure that the consumer cannot be active inside remove, making it safe to update the variables and the buffer without fear of race conditions. The variable count keeps track of how many items are in the buffer. It can take on any value from 0 through and including N − 1. The variable lo is the index of the buffer slot where the next item is to be fetched. Similarly, hi is the index of the buffer slot where the next item is to be placed. It is permitted that lo = hi, which means that either 0 items or N items are in the buffer. The value of count tells which case holds.
- Synchronized methods in Java differ from classical monitors in an essential way: Java does not have condition variables built in. Instead, it offers two procedures, wait and notify, which are the equivalent of sleep and wakeup except that when they are used inside synchronized methods, they are not subject to race conditions. In theory, the method wait can be interrupted, which is what the code surrounding it is all about. Java requires that the exception handling be made explicit.
- By making the mutual exclusion of critical regions automatic, monitors make parallel programming much less error prone than using semaphores. Nevertheless, they too have some drawbacks.
- Since monitors are a programming-language concept, the compiler must recognize them and arrange for the mutual exclusion. Most languages(C/Cpp) do not have monitors, so it is unreasonable to expect their compilers to enforce any mutual exclusion rules. How could the compiler even know which procedures were in monitors and which were not?
- These same languages do not have semaphores either, but adding semaphores is easy: add two short assembly-code routines to the library to issue the up and down system calls. The compilers do not have to know that they exist. If you have a semaphore-based operating system, you can write the user programs for it in C/Cpp. With monitors, you need a language that has them built in.
- Another problem with monitors, and also with semaphores, is that they were designed for solving the mutual exclusion problem on one or more CPUs that all have access to a common memory. By putting the semaphores in the shared memory and protecting them with TSL or XCHG instructions, we can avoid races. When we move to a distributed system consisting of multiple CPUs, each with its own private memory and connected by a local area network, these primitives become inapplicable. The conclusion is that semaphores are too low level and monitors are not usable except in a few programming languages. Also, none of the primitives allow information exchange between machines. Something else is needed.
- Since monitors are a programming-language concept, the compiler must recognize them and arrange for the mutual exclusion. Most languages(C/Cpp) do not have monitors, so it is unreasonable to expect their compilers to enforce any mutual exclusion rules. How could the compiler even know which procedures were in monitors and which were not?
2.3.8 Message Passing
- This method of interprocess communication uses two primitives, send and receive, which like semaphores, are system calls.
send(destination, &message);
receive(source, &message); - The former call sends a message to a given destination and the latter one receives a message from a given source. If no message is available, the receiver can block until one arrives. Alternatively, it can return immediately with an error code.
- Let us see how the producer-consumer problem can be solved with message passing and no shared memory.
- We assume that all messages are the same size and that messages sent but not yet received are buffered automatically by the operating system. In this solution, a total of N messages is used, analogous to the N slots in a shared-memory buffer.
- The consumer starts out by sending N empty messages to the producer.
Whenever the producer has an item to give to the consumer, it takes an empty message and sends back a full one.
So the total number of messages in the system remains constant in time, and they can be stored in a given amount of memory known in advance. - If the producer works faster than the consumer, all the messages will end up full, waiting for the consumer; the producer will be blocked, waiting for an empty message.
If the consumer works faster, all the messages will be empties waiting for the producer to fill them up; the consumer will be blocked, waiting for a full message.
How messages are addressed?
- One way is to assign each process a unique address and have messages be addressed to processes. A different way is to invent a new data structure, called a mailbox. A mailbox is a place to buffer a certain number of messages, typically specified when the mailbox is created. When mailboxes are used, the address parameters in the send and receive calls are mailboxes, not processes. When a process tries to send to a mailbox that is full, it is suspended until a message is removed from that mailbox, making room for a new one.
- For the producer-consumer problem, both the producer and consumer would create mailboxes large enough to hold N messages. The producer would send messages containing actual data to the consumer’s mailbox, and the consumer would send empty messages to the producer’s mailbox. When mailboxes are used, the buffering mechanism is clear: the destination mailbox holds messages that have been sent to the destination process but have not yet been accepted.
2.3.9 Barriers
- Some applications are divided into phases and have the rule that no process may proceed into the next phase until all processes are ready to proceed to the next phase. This behavior may be achieved by placing a barrier at the end of each phase.
- When a process reaches the barrier, it is blocked until all processes have reached the barrier. This allows groups of processes to synchronize. Barrier operation is illustrated in Fig. 2-37.
2.3.10 Avoiding Locks: Read-Copy-Update
- Sometimes we can allow a writer to update a data structure even though other processes are still using it. The point is to ensure that each reader either reads the old version of the data, or the new one, but not some weird combination of old and new.
- Readers traverse the tree from the root to its leaves.
- First Example: a new node X is added. We initialize all values in node X, including its child pointers before making it visible in the tree. Then, with one atomic write, we make X a child of A. No reader will ever read an inconsistent version.
- Second Example: We subsequently remove B and D. First, we make A’s left child pointer point to C. All readers that were in A will continue with node C and never see B or D. In other words, they will see only the new version. All readers currently in B or D will continue following the original data structure pointers and see the old version. The main reason that the removal of B and D works without locking the data structure, is that RCU (Read-Copy-Update), decouples the removal and reclamation phases of the update.
- As long as we are not sure that there are no more readers of B or D, we cannot really free them. But how long should we wait? We have to wait until the last reader has left these nodes. RCU carefully determines the maximum time a reader may hold a reference to the data structure. After that period, it can safely reclaim the memory.
- Specifically, readers access the data structure in what is known as a read-side critical section which may contain any code, as long as it does not block or sleep. In that case, we know the maximum time we need to wait. Specifically, we define a period as any time period in which we know that each thread to be outside the read-side critical section at least once. All will be well if we wait for a duration that is at least equal to the period before reclaiming. As the code in a read-side critical section is not allowed to block or sleep, a simple criterion is to wait until all the threads have executed a context switch.
2.4 SCHEDULING
2.4.1 Introduction to Scheduling
- Nearly all processes alternate bursts of computing with I/O requests, Fig. 2-39.
- The CPU runs for a while without stopping, then a system call is made to read from a file or write to a file. When the system call completes, the CPU computes again until it needs more data or has to write more data, and so on.
- Some I/O activities count as computing. For example, when the CPU copies bits to a video RAM to update the screen, it is computing, not doing I/O, because the CPU is in use. I/O is when a process enters the blocked state waiting for an external device to complete its work.
- Some processes, such as (a), spend most of their time computing, while other processes, such as (b), spend most of their time waiting for I/O. The former are called compute-bound or CPU-bound; the latter are called I/O-bound.
- Compute-bound processes typically have long CPU bursts and infrequent I/O waits, whereas I/O-bound processes have short CPU bursts and frequent I/O waits. Note that the key factor is the length of the CPU burst, not the length of the I/O burst. I/O-bound processes are I/O bound because they do not compute much between I/O requests, not because they have especially long I/O requests. It takes the same time to issue the hardware request to read a disk block no matter how much or how little time it takes to process the data after they arrive.
- Processes tend to get more I/O-bound because CPUs are improving much faster than disks. The basic idea here is that if an I/O-bound process wants to run, it should get a chance quickly so that it can issue its disk request and keep the disk busy.
There are a variety of situations in which scheduling is needed.
- When a new process is created, a decision needs to be made whether to run the parent process or the child process. Since both processes are in ready state, it is a normal scheduling decision and can go either way, that is, the scheduler can choose to run either the parent or the child next.
- A scheduling decision must be made when a process exits. That process can no longer run (since it no longer exists), so some other process must be chosen from the set of ready processes. If no process is ready, a system-supplied idle process is normally run.
- When a process blocks on I/O, on a semaphore, or for some other reason, another process has to be selected to run. Sometimes the reason for blocking may play a role in the choice. For example, if A is an important process and it is waiting for B to exit its critical region, letting B run next will allow it to exit its critical region and thus let A continue. The trouble is that the scheduler does not have the necessary information to take this dependency into account.
- When an I/O interrupt occurs, a scheduling decision may be made. If the interrupt came from an I/O device that has now completed its work, some process that was blocked waiting for the I/O may now be ready to run. It is up to the scheduler to decide whether to run the newly ready process, the process that was running at the time of the interrupt, or some third process. If a hardware clock provides periodic interrupts at 50 or 60 Hz or some other frequency, a scheduling decision can be made at each clock interrupt or at every kth clock interrupt.
Scheduling algorithms can be divided into two categories with respect to how they deal with clock interrupts.
- A nonpreemptive scheduling algorithm picks a process to run and then just lets it run until it blocks (either on I/O or waiting for another process) or voluntarily releases the CPU. Even if it runs for many hours, it will not be forced suspended. In effect, no scheduling decisions are made during clock interrupts. After clock-interrupt processing has been finished, the process that was running before the interrupt is resumed, unless a higher-priority process was waiting for a now-satisfied timeout.
- A preemptive scheduling algorithm picks a process and lets it run for a maximum of some fixed time. If it is still running at the end of the time interval, it is suspended and the scheduler picks another process to run (if one is available). Doing preemptive scheduling requires having a clock interrupt occur at the end of the time interval to give control of the CPU back to the scheduler. If no clock is available, nonpreemptive scheduling is the only option.
Different environments
- In different environments different scheduling algorithms are needed. Three environments worth distinguishing are
- Batch.
- Interactive.
- Real time.
- Batch systems are still in widespread use in the business world. In batch systems, there are no users waiting at their terminals for a quick response to a short request. Consequently, nonpreemptive algorithms, or preemptive algorithms with long time periods for each process, are often acceptable. This approach reduces process switches and thus improves performance.
- In an environment with interactive users, preemption is essential to keep one process from occupying the CPU and denying service to the others. Even if no process ran forever, one process might shut out all the others indefinitely due to a program bug. Preemption is needed to prevent this behavior. Servers also fall into this category, since they normally serve multiple (remote) users.
- In systems with real-time constraints, preemption is sometimes not needed because the processes know that they may not run for long periods of time and usually do their work and block quickly. The difference with interactive systems is that real-time systems run only programs that are intended to further the application at hand. Interactive systems are general purpose and may run arbitrary programs that are not cooperative and even possibly malicious.
Some goals to design a scheduling algorithm.
- The managers of large computer centers that run many batch jobs typically look at three metrics to see how well their systems are performing: throughput, turnaround time, and CPU utilization.
- Throughput is the number of jobs per hour that the system completes.
- Turnaround time is the statistically average time from the moment that a batch job is submitted until the moment it is completed. It measures how long the average user has to wait for the output. Here the rule is: Small is Beautiful.
- A scheduling algorithm that tries to maximize throughput may not necessarily minimize turnaround time. For example, given a mix of short jobs and long jobs, a scheduler that always ran short jobs and never ran long jobs might achieve an excellent throughput (many short jobs per hour) but at the expense of a terrible turnaround time for the long jobs.
- CPU utilization is not a good metric. What really matters is how many jobs per hour come out of the system (throughput) and how long it takes to get a job back (turnaround time). But knowing when the CPU utilization is almost 100% is useful for knowing when it is time to get more computing power.
- For interactive systems, the most important one is to minimize response time, that is, the time between issuing a command and getting the result. On a personal computer where a background process is running, a user request to start a program or open a file should take precedence over the background work. Having all interactive requests go first will be perceived as good service.
- Real-time systems: They are characterized by having deadlines that must or at least should be met. For example, if a computer is controlling a device that produces data at a regular rate, failure to run the data-collection process on time may result in lost data. Thus the foremost need in a real-time system is meeting all (or most) deadlines.
2.4.2 Scheduling in Batch Systems
First-Come, First-Served
- With First-Come, First-Served algorithm, processes are assigned the CPU in the order they request it.
- Basically, there is a single queue of ready processes. When the first job enters the system from the outside in the morning, it is started immediately and allowed to run as long as it wants to. It is not interrupted because it has run too long.
- As other jobs come in, they are put onto the end of the queue.
- When the running process blocks, the first process on the queue is run next.
- When a blocked process becomes ready, like a newly arrived job, it is put on the end of the queue, behind all waiting processes.
- Disadvantage: Suppose there is one compute-bound process that runs for 1 sec at a time and many I/O-bound processes that use little CPU time but each have to perform 1000 disk reads to complete. The compute-bound process runs for 1 sec, then it reads a disk block. All the I/O processes now run and start disk reads. When the compute-bound process gets its disk block, it runs for another 1 sec, followed by all the I/O-bound processes in quick succession. The result is that each I/O-bound process gets to read 1 block per second and will take 1000 sec to finish. With a scheduling algorithm that preempted the compute-bound process every 10 msec, the I/O-bound processes would finish in 10 sec instead of 1000 sec, and without slowing down the compute-bound process very much.
Shortest Job First
- Look at Fig. 2-41. Here we find four jobs A, B, C, and D with run times of 8, 4, 4, and 4 minutes, respectively. By running them in that order, the turnaround time for A is 8 minutes, for B is 12 minutes, for C is 16 minutes, and for D is 20 minutes for an average of 14 minutes.
- Now let us consider running these four jobs using shortest job first, as shown in Fig. 2-41(b). The turnaround times are now 4, 8, 12, and 20 minutes for an average of 11 minutes. Shortest job first is provably optimal.
- Consider the case of four jobs, with execution times of a, b, c, and d, respectively. The first job finishes at time a, the second at time a + b, and so on. The mean turnaround time is (4a + 3b + 2c + d) / 4. It is clear that a contributes more to the average than the other times, so it should be the shortest job, with b next, then c, and finally d as the longest since it affects only its own turnaround time. The same argument applies equally well to any number of jobs.
- It is worth pointing out that shortest job first is optimal only when all the jobs are available simultaneously. Consider five jobs, A through E, with run times of 2, 4, 1, 1, and 1, respectively. Their arrival times are 0, 0, 3, 3, and 3. Initially, only A or B can be chosen, since the other three jobs have not arrived yet. Using shortest job first, we will run the jobs in the order A, B, C, D, E, for an average wait of 4.6. However, running them in the order B, C, D, E, A has an average wait of 4.4.
Shortest Remaining Time Next
- A preemptive version of shortest job first is shortest remaining time next. With this algorithm, the scheduler always chooses the process whose remaining run time is the shortest. Again here, the run time has to be known in advance. When a new job arrives, its total time is compared to the current process’ remaining time. If the new job needs less time to finish than the current process, the current process is suspended and the new job started. This scheme allows new short jobs to get good service.
2.4.3 Scheduling in Interactive Systems
Round-Robin Scheduling
- Each process is assigned a time interval, called its quantum, during which it is allowed to run. If the process is still running at the end of the quantum, the CPU is preempted and given to another process. If the process has blocked or finished before the quantum has elapsed, the CPU switching is done when the process blocks.
- All the scheduler needs to do is maintain a list of runnable processes, as shown in (a). When the process uses up its quantum, it is put on the end of the list, as shown in (b).
- Switching from one process to another requires a certain amount of time for doing all the administration(saving and loading registers…). Suppose that this process switch or context switch takes 1 msec. Also suppose that the quantum is set at 4 msec. With these parameters, after doing 4 msec of useful work, the CPU will have to spend (i.e., waste) 1 msec on process switching. Thus 20% of the CPU time will be thrown away on administrative overhead. Clearly, this is too much.
- To improve the CPU efficiency, we could set the quantum to, say, 100 msec. Now the wasted time is only 1%. But consider what happens on a server system if 50 requests come in within a very short time interval and with widely varying CPU requirements. Fifty processes will be put on the list of runnable processes. If the CPU is idle, the first one will start immediately, the second one may not start until 100 msec later, and so on. The unlucky last one may have to wait 5 sec before getting a chance, assuming all the others use their full quanta. This situation is especially bad if some of the requests near the end of the queue required only a few milliseconds of CPU time. With a short quantum they would have gotten better service.
- Another factor is that if the quantum is set longer than the mean CPU burst, preemption will not happen very often. Instead, most processes will perform a blocking operation before the quantum runs out, causing a process switch. Eliminating preemption improves performance because process switches then happen only when they are logically necessary, that is, when a process blocks and cannot continue.
- The conclusion can be formulated as follows: setting the quantum too short causes too many process switches and lowers the CPU efficiency, but setting it too long may cause poor response to short interactive requests. A quantum around 20 – 50 msec is often a reasonable compromise.
Priority Scheduling
- The basic idea is that each process is assigned a priority, and the runnable process with the highest priority is allowed to run.
- To prevent high-priority processes from running indefinitely, the scheduler may decrease the priority of the currently running process at each clock interrupt. If this action causes its priority to drop below that of the next highest process, a process switch occurs. Alternatively, each process may be assigned a maximum time quantum that it is allowed to run. When this quantum is used up, the next-highest-priority process is given a chance to run.
- It is often convenient to group processes into priority classes and use priority scheduling among the classes but round-robin scheduling within each class.
- The scheduling algorithm is as follows: as long as there are runnable processes in priority class 4, just run each one for one quantum, round-robin fashion, and never bother with lower-priority classes. If priority class 4 is empty, then run the class 3 processes round robin. If classes 4 and 3 are both empty, then run class 2 round robin, and so on.
Multiple Queues
- Giving all processes a large quantum would mean poor response time. Solution was to set up priority classes. Processes in the highest class were run for one quantum. Processes in the next-highest class were run for two quanta. Processes in the next one were run for four quanta, etc. Whenever a process used up all the quanta allocated to it, it was moved down one class.
- Consider a process that needed to compute continuously for 100 quanta. It would initially be given one quantum, then swapped out. Next time it would get two quanta before being swapped out. On succeeding runs it would get 4, 8, 16, 32, and 64 quanta, although it would have used only 37 of the final 64 quanta to complete its work. Only 7 swaps would be needed (including the initial load) instead of 100 with a pure round-robin algorithm. Furthermore, as the process sank deeper and deeper into the priority queues, it would be run less and less frequently, saving the CPU for short, interactive processes.
Shortest Process Next
- Interactive processes generally follow the pattern of wait for command, execute command, wait for command, execute command, etc. If we regard the execution of each command as a separate ‘‘job,’’ then we can minimize overall response time by running the shortest one first. The problem is figuring out which of the currently runnable processes is the shortest one.
- One approach is to make estimates based on past behavior and run the process with the shortest estimated running time. Suppose that the estimated time per command for some process is T0 . Now suppose its next run is measured to be T1 . We could update our estimate by taking a weighted sum of these two numbers, that is, a*T0 + (1 − a)*T1. Through the choice of ’a’ we can decide to have the estimation process forget old runs quickly, or remember them for a long time. With a = 1/2, we get successive estimates of
T0 , T0/2 + T1/2, T0/4 + T1/4 + T2/2, T0/8 + T1/8 + T2/4 + T3/2
After three new runs, the weight of T0 in the new estimate has dropped to 1/8.
Guaranteed Scheduling
- A different approach to scheduling is to make real promises to the users about performance and then live up to those promises. One promise that is realistic to make and easy to live up to is this: If n users are logged in while you are working, you will receive about 1/n of the CPU power. Similarly, on a single-user system with n processes running, all things being equal, each one should get 1/n of the CPU cycles.
- The system must keep track of how much CPU each process has had since its creation. It then computes the amount of CPU each one is entitled to, namely the time since creation divided by n. It is straightforward to compute the ratio of actual CPU time consumed to CPU time entitled. A ratio of 0.5 means that a process has only had half of what it should have had, and a ratio of 2.0 means that a process has had twice as much as it was entitled to
- The algorithm is then to run the process with the lowest ratio until its ratio has moved above that of its closest competitor. Then that one is chosen to run next.
Lottery Scheduling
- The basic idea is to give processes lottery tickets for various system resources, such as CPU time. Whenever a scheduling decision has to be made, a lottery ticket is chosen at random, and the process holding that ticket gets the resource. When applied to CPU scheduling, the system might hold a lottery 50 times a second, with each winner getting 20 msec of CPU time as a prize.
Fair-Share Scheduling
- Some systems take into account which user owns a process before scheduling it. Each user is allocated some fraction of the CPU and the scheduler picks processes in such a way as to enforce it. Thus if two users have each been promised 50% of the CPU, they will each get that, no matter how many processes they have in existence.
- As an example, consider a system with two users, each of which has been promised 50% of the CPU. User 1 has four processes, A, B, C, and D, and user 2 has only one process, E. If round-robin scheduling is used, a possible scheduling sequence that meets all the constraints is this one:
A E B E C E D E A E B E C E D E …
If user 1 is entitled to twice as much CPU time as user 2, we might get
A B E C D E A B E C D E …
2.4.4 Scheduling in Real-Time Systems
- A real-time system is one in which time plays an essential role. One or more physical devices external to the computer generate messages, and the computer must react appropriately to them within a fixed amount of time.
- Real-time systems are generally categorized as hard real time, meaning there are absolute deadlines that must be met; soft real time, meaning that missing an occasional deadline is undesirable, but tolerable.
- In both cases, real-time behavior is achieved by dividing the program into a number of processes, each of whose behavior is predictable and known in advance. These processes are generally short lived and can run to completion in well under a second. When an external event is detected, it is the job of the scheduler to schedule the processes in such a way that all deadlines are met.
- The events that a real-time system may have to respond to can be categorized as periodic (meaning they occur at regular intervals) or aperiodic (meaning they occur unpredictably).
- A system may have to respond to multiple periodic-event streams. Depending on how much time each event requires for processing, handling all of them may not even be possible. For example, if there are m periodic events and event i occurs with period Pi and requires Ci sec of CPU time to handle each event, then the load can be handled only if (C1 + C2 + … + Cm) / (P1 + P2 + … + Pm) <= 1
- A real-time system that meets this criterion is said to be schedulable. This means it can actually be implemented. A process that fails to meet this test cannot be scheduled because the total amount of CPU time the processes want collectively is more than the CPU can deliver.
- Real-time scheduling algorithms can be static or dynamic. The former make their scheduling decisions before the system starts running. The latter make their scheduling decisions at run time, after execution has started. Static scheduling works only when there is perfect information available in advance about the work to be done and the deadlines that have to be met. Dynamic scheduling algorithms do not have these restrictions.
2.4.5 Policy Versus Mechanism
- None of the schedulers discussed above accept any input from user processes about scheduling decisions. So the scheduler rarely makes the best choice. The solution to this problem is to separate the scheduling mechanism from the scheduling policy.
- What this means is that the scheduling algorithm is parameterized in some way, but the parameters can be filled in by user processes.
- Suppose that the kernel uses a priority-scheduling algorithm but provides a system call by which a process can set (and change) the priorities of its children. In this way, the parent can control how its children are scheduled, even though it itself does not do the scheduling. Here the mechanism is in the kernel but policy is set by a user process. Policy-mechanism separation is a key idea.
2.4.6 Thread Scheduling
- When several processes each have multiple threads, we have two levels of parallelism present: processes and threads. Scheduling in such systems differs substantially depending on whether user-level threads or kernel-level threads (or both) are supported.
User-level thread
- Since the kernel is not aware of the existence of threads, it operates as it always does, picking a process A and giving A control for its quantum. The thread scheduler inside A decides which thread to run, say A1. Since there are no clock interrupts to multiprogram threads, this thread may continue running as long as it wants to. If it uses up the process’ entire quantum, the kernel will select another process to run.
- When the process A finally runs again, thread A1 will resume running. It will continue to consume all of A’s time until it is finished. However, its behavior will not affect other processes. They will get whatever the scheduler considers their appropriate share, no matter what is going on inside process A.
- Now consider the case that A’s threads have relatively little work to do per CPU burst, for example, 5 msec of work within a 50-msec quantum. Consequently, each one runs for a little while, then yields the CPU back to the thread scheduler. This might lead to the sequence A1, A2, A3, A1, A2, A3, before the kernel switches to process B. This situation is illustrated in Fig. 2-44(a).
- The scheduling algorithm used by the run-time system can be any of the ones described above. In practice, round-robin scheduling and priority scheduling are most common. The only constraint is the absence of a clock to interrupt a thread that has run too long. Since threads cooperate, this is usually not an issue.
Kernel-level thread
- Here the kernel picks a particular thread to run. It does not have to take into account which process the thread belongs to, but it can if it wants to. The thread is given a quantum and is forcibly suspended if it exceeds the quantum. With a 50-msec quantum but threads that block after 5 msec, the thread order for some period of 30 msec might be A1, B1, A2, B2, A3, B3.
- A major difference between user-level threads and kernel-level threads is the performance. Doing a thread switch with user-level threads takes a handful of machine instructions. With kernel-level threads it requires a full context switch, changing the memory map and invalidating the cache, which is several orders of magnitude slower. On the other hand, with kernel-level threads, having a thread block on I/O does not suspend the entire process as it does with user-level threads.
- Since the kernel knows that switching from a thread in process A to a thread in process B is more expensive than running a second thread in process A (due to having to change the memory map and having the memory cache spoiled), it can take this information into account when making a decision. For example, given two threads that are otherwise equally important, with one of them belonging to the same process as a thread that just blocked and one belonging to a different process, preference could be given to the former.
- Another important factor is that user-level threads can employ an application-specific thread scheduler.
- Consider the Web server of Fig. 2-8. Suppose that a worker thread has just blocked and the dispatcher thread and two worker threads are ready. Who should run next? The run-time system, knowing what all the threads do, can easily pick the dispatcher to run next, so that it can start another worker running. This strategy maximizes the amount of parallelism in an environment where workers frequently block on disk I/O. With kernel-level threads, the kernel would never know what each thread did (although they could be assigned different priorities). In general, however, application-specific thread schedulers can tune an application better than the kernel can.
2.5 CLASSICAL IPC PROBLEMS
2.5.1 The Dining Philosophers Problem
- Five philosophers are seated around a circular table. Each philosopher has a plate of spaghetti and a philosopher needs two forks to eat it. Between each pair of plates is one fork. The life of a philosopher consists of alternating periods of eating and thinking. When a philosopher gets hungry, she tries to acquire her left and right forks, one at a time, in either order. If successful in acquiring two forks, she eats for a while, then puts down the forks, and continues to think. Can you write a program for each philosopher that does what it is supposed to do and never gets stuck?
- Figure 2-46 shows the obvious solution which is wrong. Suppose that all five philosophers take their left forks simultaneously. None will be able to take their right forks, and there will be a deadlock.
- We could modify the program so that after taking the left fork, the program checks to see if the right fork is available. If it is not, the philosopher puts down the left one, waits for some time, and then repeats the whole process.
- This proposal also fails: All the philosophers could start the algorithm simultaneously, picking up their left forks, seeing that their right forks were not available, putting down their left forks, waiting, picking up their left forks again simultaneously, and so on, forever.
- A situation like this, in which all the programs continue to run indefinitely but fail to make any progress, is called starvation.
- If the philosophers would wait a random time instead of the same time after failing to acquire the right-hand fork, the chance that everything would continue in lockstep for even an hour is very small. This observation is true, and in nearly all applications trying again later is not a problem. For example, in the Ethernet local area network, if two computers send a packet at the same time, each one waits a random time and tries again; in practice this solution works fine. However, in a few applications one would prefer a solution that always works and cannot fail due to an unlikely series of random numbers.
- One solution that has no deadlock and no starvation is to protect the five statements following the call to think by a binary semaphore. Before starting to acquire forks, a philosopher would do a down on mutex. After replacing the forks, she would do an up on mutex. But it has a performance bug: only one philosopher can be eating at any instant. With five forks available, we should be able to allow two philosophers to eat at the same time.
- The solution presented in Fig. 2-47 is deadlock-free and allows the maximum parallelism for an arbitrary number of philosophers. It uses an array, state, to keep track of whether a philosopher is eating, thinking, or hungry (trying to acquire forks). A philosopher may move into eating state only if neither neighbor is eating. Philosopher i’s neighbors are defined by the macros LEFT and RIGHT.
- The program uses an array of semaphores, one per philosopher, so hungry philosophers can block if the needed forks are busy. Note that each process runs the procedure philosopher as its main code, but the other procedures, take forks, put forks, and test, are ordinary procedures and not separate processes.
2.5.2 The Readers and Writers Problem
- The dining philosophers problem is useful for modeling processes that are competing for exclusive access to a limited number of resources, such as I/O devices. Another problem is the readers and writers problem, which models access to a database.
- Imagine an airline reservation system, with many competing processes wishing to read and write it. It is acceptable to have multiple processes reading the database at the same time, but if one process is updating (writing) the database, no other processes may have access to the database, not even readers. How do you program the readers and the writers?
- In this solution, the first reader to get access to the database does a down on the semaphore db. Subsequent readers merely increment a counter, rc. As readers leave, they decrement the counter, and the last to leave does an up on the semaphore, allowing a blocked writer, if there is one, to get in.
- The solution presented here implicitly contains a subtle decision worth noting.
- Suppose that while a reader is using the database, another reader comes along. Since having two readers at the same time is not a problem, the second reader is admitted. Additional readers can also be admitted if they come along. Now suppose a writer shows up. The writer may not be admitted to the database, since writers must have exclusive access, so the writer is suspended. Later, additional readers show up. As long as at least one reader is still active, subsequent readers are admitted. As a consequence of this strategy, as long as there is a steady supply of readers, they will all get in as soon as they arrive. The writer will be kept suspended until no reader is present. If a new reader arrives, say, every 2 sec, and each reader takes 5 sec to do its work, the writer will never get in.
- To avoid this situation: when a reader arrives and a writer is waiting, the reader is suspended behind the writer instead of being admitted immediately. In this way, a writer has to wait for readers that were active when it arrived to finish but does not have to wait for readers that came along after it. The disadvantage of this solution is that it achieves less concurrency and thus lower performance.
Please indicate the source: http://blog.youkuaiyun.com/gaoxiangnumber1
Welcome to my github: https://github.com/gaoxiangnumber1