The Fibers of Threads

本文深入探讨了Linux下线程的实现方式,特别是通过剖析one-to-one模型下的具体实现细节,包括clone系统调用的作用及线程管理器的工作原理。

For the last several months in this column, we've been looking at programming with Linux's threads library, pthreads. However, we have taken for granted the work that is actually done under the covers by the pthreads libraries. So this month's Compile Time will dissect Linux pthreads themselves to discover exactly what it is that makes them tick.

Before we dive in and start looking at this topic though, you'll probably find it a lot easier to follow if you are already familiar with a couple of other concepts. First, we assume that you have an understanding of the difference between code that runs in "privileged mode" (i.e., kernel space) and code that runs in "user mode" (i.e., user space). It's also important that you understand how system calls work and how they switch between user mode and privileged mode. If you need to brush up on these topics, a good introduction can be found in the May 1999 Compile Time, located on the Web at: http://www.linux-mag.com/1999-05/compile_01.html.

There are three basic ways that threads can be implemented on a system: many-to-one, one-to-one, and many-to-many. Each method requires a different amount of support from the operating system and has its own strengths and weaknesses. The names of the methods refer to the number of threads the application 'sees' in user space compared to the number of threads (i.e., processes) the operating system sees in kernel space.

Many-to-One

In a many-to-one implementation of threads, the threads are completely written in user space. From the kernel's point of view, there is only one process executing. Although that process is multi-threaded, to the kernel it looks the same as any other non-threaded process. The advantage of this method is that it requires no additional support from the kernel and therefore can be implemented on any system. All the thread management issues (including scheduling) are handled in user space by the application.

Interestingly, the many-to-one implementation's greatest strength just happens to be its greatest weakness as well. Because the operating system views the multi-threaded application as a single process, that application can never take advantage of additional processors if they exist. The multi-threaded application is guaranteed to run only on a single processor even if additional resources are available.

In fact, in many cases, the multi-threaded application will not even act like a multi-threaded program at all. For example, if one of the threads is waiting for a system call to return, the entire application must also wait! When a system call executes, the calling process is required to block until it completes. Therefore, in a many-to-one implementation, all threads will stop executing when any one of them needs to switch into kernel space. This is intolerable behavior in a threaded application (e.g., imagine if all the windows in your GUI froze up when the application was waiting for user input!) and, therefore, this implementation is not really a solution to the problem of implementing threads.

One-to-One

The one-to-one implementation of threads solves the two problems of the many-to-one implementation listed above. In this implementation, every thread in a multi-threaded application is its own process in the kernel. This implementation is often simpler than the many-to-one implementation because it allows the kernel to perform the scheduling of the threads just as it does for other processes. Also, if the machine on which the multi-threaded application is running has multiple processors, the operating system can schedule each thread to run concurrently on different processors.

Of course, since each thread is its own process, they need to not be blocked by the operating system when another one of them switches into kernel space. Unlike the many-to-one method, this implementation requires some support from the operating system. Recall that threads in a multi-threaded application share the same memory while executing. Therefore, the kernel must provide a means for creating new processes that share memory.

Finally, because each thread is its own process, many of the synchronization techniques (e.g., semaphores and condition variables) carry a large overhead cost due to context switching between user space and kernel space. For example, when waking up a set of threads that are waiting on the condition, a switch between kernel space and user space and back again must occur to wake up each thread. The associated overhead of this operation is two times the number of threads waiting times the cost of a context switch. Not cheap.

Many-To-Many

The many-to-many implementation of threads is nice because it takes the best ideas from both the one-to-one and many-to-one methods while avoiding many of their disadvantages. In the many-to-many model, many kernel processes execute, and each process represents a scheduler that picks a thread to run. Basically, each process acts as its own many-to-one implementation of threads by scheduling different threads to run within that kernel process.

Having multiple processes coordinate to do this solves the problem of all threads being blocked when one thread makes a system call and switches into kernel space. In the many-to-many model, the other kernel processes can still schedule threads to run. Also, the problems with context-switch overhead are mitigated because each process performs the switches between execution of threads. As a result, this switching happens entirely in user space.

Of course, all this functionality comes with a price -- this method is the most complicated to implement of the three and requires kernel support for the relationship between kernel threads and user threads. It is the most common implementation on commercial Unix systems (e.g., Solaris, Digital Unix, and IRIX).

The Linux Way

The Linux implementation of pthreads follows the one-to-one method. The kernel does not support the functionality required for a many-to-many implementation, and since context-switches in Linux are relatively fast, that disadvantage of the one-to-one method is much less of a problem. Linux also provides the mechanism necessary to create a new process that shares a memory space. You are probably familiar with the fork() system call that creates a new process based upon the calling process. Linux provides another system call, __clone(), that is more general than fork().

Where fork() only gives the child process a copy of the state of the parent process, clone() can be used to create a new process that shares or copies resources with an existing process. You can share or copy the memory map, file systems, file descriptors, and signal handlers of the existing process. The fork() system call is essentially a special case of clone() where none of the resources are shared. Let's take a closer look at the clone() system call. Its prototype, as found in <sched.h>, is listed in Figure Two.

 
int __clone (int (*fn) (void *arg), void *child_stack, int flags, void *arg)

Notice that, unlike fork(), clone() takes in a function, fn, to be executed. This is the first argument to the call. When clone() successfully completes, fn will be executing simultaneously with the calling function.

The child_stack argument specifies where the child process should start its stack. Since it is possible for the child and the parent to share a memory map, the parent is responsible for allocating space for the child process's stack. The parent and child can execute different functions after the call to clone(), so they cannot share a single stack.

The flags argument indicates exactly which parts of the parent should be shared with the child. Table One shows a list of the flags that may be bitwise-or'ed together and passed in the call to clone(). Besides the sharing flags, the lowest byte of the flags argument represents the number of the signal to be sent to the parent when the child process dies.

Table One: Flags for clone()

FlagDescription
CLONE_VMShare the memory between the parent and child process. If this flag is not set, the memory will be copied as is the case with fork()
CLONE_FSShare the file system information. A call to a function like chdir() will have an effect on both the child and parent process regardless of the caller.
CLONE_FILESShare file descriptors. Any newly created file descriptors after the clone() will be vaild in both child and parent processes.
CLONE_SIGHANDShare signal handlers. All signal handlers set up by either parent or child after the clone() are common to both.
CLONE_PIDShare the process ID. The child and parent process will have the same pid. Should not be used with Linux kernel versions after 2.1.97.

The last argument to clone(), arg, represents the argument that is to be passed to the function, fn, that clone() will execute. The __clone() function returns the process ID of the child process when it succeeds. It will return -1 when it fails. See the man page for more information about the failure cases.

If you've been following this column's series of articles on threads over the past few months, you may be thinking that the functionality of clone() seems awfully familiar... Well, that's because it is pretty much how the pthread_ create() function behaves, and the similarity is no coincidence. As stated in the man page for clone(), "the main use of __clone() is to implement threads." The Linux kernel developers felt that the one-to-one method of implementing threads was the way to go, and therefore designed the kernel to enable threads support in that way. The implementors of the pthreads library simply followed their lead.

The Structure of Threads

Now that we understand the clone() system call, let's return to our discussion about the implementation of threads under Linux. In the following, we will be referring to the implementation of pthreads found in glibc version 2.1.3. All of the files mentioned are in the linuxthreads directory unless otherwise specified.

The basic structure of the implementation of threads is as follows: when the first thread is created (i.e., in the first call to pthread_create()), a manager thread is created to manage the subsequent creation and termination of threads. The manager will perform the actual creation of new threads. Once the manager is created and initialized, the thread that wishes to create a new thread sends a message to the manager via a pipe and then suspends itself until the manager satisfies the request and wakes it up.

The thread manager spaces each of the thread's stacks 2 MB apart. At the top of each thread's stack resides a struct containing the relevant information about that thread (e.g., it's process ID, etc.) for use by the manager and other threads. The definition of that struct is found in internals.h, and the first few fields can be seen in Listing One (there are more than 40 fields in this struct).

struct _pthread_descr_struct {
pthread_descr p_nextlive, p_prevlive;
/* Double chaining of active threads */
pthread_descr p_nextwaiting; /* Next element in the queue holding the threads */
pthread_descr p_nextlock; /* can be on a queue and waiting on a lock */
pthread_t p_tid; /* Thread identifier */
int p_pid; /* PID of Unix process */
int p_priority; /* Thread priority (== 0 if not realtime) */
struct _pthread_fastlock * p_lock; /* Spinlock for synchronized accesses */
...
/* New elements must be added at the end. */
} __attribute__ ((aligned(32))); /* We need to align the structure so that
doubles are aligned properly. This is 8
bytes on MIPS and 16 bytes on MIPS64.
32 bytes might give better cache
utilization. */
stack graphic
Figure One: A stack after a few threads have been created.

To get a better idea for how threads are laid out in memory, let's look at a picture of the stack after a few threads are created (see Figure One,). Since stacks grow down on most architectures, we draw our picture assuming that this is the case in our example. Notice that the initial thread lies at the top of the stack. This is the thread that represents the main() function in an application. The thread descriptor for this thread is kept in the space for global variables and is defined in pthread.c. Below the initial thread on the stack is the thread manager's stack (it's descriptor is also in global space and defined in pthread.c), followed by the struct for each thread and its stack. The manager keeps track of each thread by keeping track of the top of the thread's stacks and, therefore, a reference to their respective structs.

Creating New Threads

Now that we have a clear picture of the memory layout, we can look at the code that does the work of creating new threads (see Listing Two). When a program calls pthread_create(), the __pthread_create_ 2_1() function is called in the glibc library -- the file called pthread.c. First, the function gets the reference to itself (thread_self() returns a pointer to the struct_ pthread_descr_struct).

Listing Two: The pthread_create() Function in pthread.c

/* Thread creation */

int __pthread_create_2_1(pthread_t *thread, const pthread_attr_t *attr,
void * (*start_routine)(void *), void *arg)
{
pthread_descr self = thread_self();
struct pthread_request request;
if (__pthread_manager_request < 0) {
if (__pthread_initialize_manager() < 0) return EAGAIN;
}
request.req_thread = self;
request.req_kind = REQ_CREATE;
request.req_args.create.attr = attr;
request.req_args.create.fn = start_routine;
request.req_args.create.arg = arg;
sigprocmask(SIG_SETMASK, (const sigset_t *) NULL,
&request.req_args.create.mask);
__libc_write(__pthread_manager_request,(char*) &request,sizeof(request));
suspend(self);
if (THREAD_GETMEM(self, p_retcode) == 0)
*thread = (pthread_t) THREAD_GETMEM(self, p_retval);
return THREAD_GETMEM(self, p_retcode);
}

Next, the function creates a request to be sent to the thread manager. Notice that the request is of type REQ_ CREATE. The request also contains the function to be run (start_routine), its argument (arg), and the attributes (attr) the new thread should have. After indicating that the current thread should not be disturbed by any signals through the call to sigprocmask(), it writes the create request to the write end of the pipe to communicate that information with the manager. Then the thread suspends itself, waiting for the manager to wake it up once the request has been completed. If the manager successfully created the new thread, the argument passed into pthread_ create() is set, and the return code is returned.

The code for the thread manager is found in the file manager.c. The main function of this file is __pthread_ manager(), which sits in a while() loop, polling the read end of the communication pipe. Whenever it sees that there is information to be read, it reads the request, calls the appropriate handler, and continues to poll. The snippet of code that handles the REQ_CREATE request is shown in Listing Three. Note that the function that does the creation is pthread_ handle_create(), which is also in "manager.c" and is shown in Listing Four.

Listing Three: Code That Handles the REQ_CREATE Request

/* The server thread managing requests for thread creation and termination */

int __pthread_manager(void *arg)
{
... /* Initialization and signal handling omitted for brevity. */
/* Enter server loop */
while(1) {
n = __poll(&ufd, 1, 2000);
... /* Other checks omitted for brevity. */

/* Read and execute request */
if (n == 1 && (ufd.revents & POLLIN)) {
n = __libc_read(reqfd, (char *)&request, sizeof(request));
ASSERT(n == sizeof(request));
switch(request.req_kind) {
case REQ_CREATE:
request.req_thread->p_retcode =
pthread_handle_create((pthread_t *) &request.req_thread->p_retval,
request.req_args.create.attr,
request.req_args.create.fn,
request.req_args.create.arg,
&request.req_args.create.mask,
request.req_thread->p_pid,
request.req_thread->p_report_events,
&request.req_thread->p_eventbuf.eventmask);
restart(request.req_thread);
break;
/* Other cases */
...
}
}
}
}

Listing Four: The pthread_handle_create() Function

static int pthread_handle_create(pthread_t *thread, const pthread_attr_t *attr,
void * (*start_routine)(void *), void *arg,
sigset_t * mask, int father_pid,
int report_events,
td_thr_events_t *event_maskp)
{
/* Initialization of local variables. */
...

/* Find a free segment for the thread, and allocate a stack if needed */
for (sseg = 2; ; sseg++)
{
if (sseg >= PTHREAD_THREADS_MAX)
return EAGAIN;
if (__pthread_handles[sseg].h_descr != NULL)
continue;
if (pthread_allocate_stack(attr, thread_segment(sseg), pagesize,
&new_thread, &new_thread_bottom,
&guardaddr, &guardsize) == 0)
break;
}
__pthread_handles_num++;
/* Allocate new thread identifier */
...
/* Initialize the thread descriptor. Elements which have to be
initialized to zero already have this value. */
...
/* Initialize the thread handle */
...
/* Determine scheduling parameters for the thread */
...
/* Finish setting up arguments to pthread_start_thread */
new_thread->p_start_args.start_routine = start_routine;
new_thread->p_start_args.arg = arg;
new_thread->p_start_args.mask = *mask;
/* Raise priority of thread manager if needed */
...
/* Do the cloning. We have to use two different functions depending
on whether we are debugging or not. */
pid = 0; /* Note that the thread never can have PID zero. */
...
if (pid == 0)
pid = __clone(pthread_start_thread, (void **) new_thread,
CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGHAND |
__pthread_sig_cancel, new_thread);
/* Check if cloning succeeded */
if (pid == -1) {
...
}
/* Insert new thread in doubly linked list of active threads */
new_thread->p_prevlive = __pthread_main_thread;
new_thread->p_nextlive = __pthread_main_thread->p_nextlive;
__pthread_main_thread->p_nextlive->p_prevlive = new_thread;
__pthread_main_thread->p_nextlive = new_thread;
/* Set pid field of the new thread, in case we get there before the
child starts. */
new_thread->p_pid = pid;
/* We're all set */
*thread = new_thread_id;
return 0;
}

Much of pthread_handle_ create() has been omitted for space purposes. For parts that are left out, the comments have been kept so you can see what occurs there. The code that remains is the guts of the thread creation mechanism. First, this function allocates a new stack for the new thread through the pthread_allocate_ stack() function. It then sets the necessary fields in the thread descriptor (including the start function and the argument to it). Next, it calls the __clone() function with the flags discussed above to share all attributes with the child. The pthread_start_thread() function that is run from clone() essentially performs some more initialization of the thread and finishes by calling the start_routine() function. And thus, the new thread is born.

There's Always More...

Hopefully, you have enjoyed this whirlwind tour of the implementation of threads under Linux. If you would like to learn more about the clone() system call and how it relates to fork(), take a look at the book Linux Kernel Internals, by Beck, Hohme, Dziadzka, Kunitz, Magnus, and Verworner. As you might imagine, the implementations of the two functions are very similar to each other.

If you are interested in learning more about the implementation of threads (including the other pthreads functions), the code in glibc/linuxthreads is very well documented and is quite easy to read. Also, that directory contains a README and a FAQ.html that provide even more information about the use of threads. In the meantime, happy hacking!



03-24
<think>好的,我现在需要处理用户关于“fibers”的查询。首先,用户的问题是在编程或技术上下文中了解“fibers”的相关信息。我需要先确定用户指的是哪种类型的“fibers”,因为这个词在不同的领域有不同的含义。 首先,用户可能指的是计算机科学中的“纤程”(Fiber),这是一种轻量级的线程,用于并发编程。或者,可能涉及到光纤(Optical Fiber)技术,这在网络通信中使用广泛。根据用户提供的引用内容,引用[2]提到了“Optical fibers directly connecting devices”,这可能与光纤有关。但用户特别指出是“in programming or technology context”,所以更可能是指编程中的纤程。 接下来,我需要整理关于纤程的信息。纤程是比线程更轻量的执行单元,由用户态调度,协作式多任务处理。需要解释其特点,如协作式调度、轻量级、无需上下文切换等。同时,对比线程和协程,比如纤程与协程的区别,可能在于调度方式和资源占用。 然后,可能的应用场景包括高性能服务器、游戏开发、实时系统等。例如,游戏引擎中的逻辑处理可能使用纤程来提高效率。此外,需要提供编程语言的例子,比如C++、C#中的支持,或者第三方库如Boost.Fiber。 同时,用户提供的引用[1]提到了数据结构课程,可能与并发数据结构有关,但暂时关联不大。引用[2]提到的光纤属于物理层的连接技术,可能需要简要提及,但用户明确询问编程或技术上下文,所以重点应在纤程。 需要确保回答结构清晰,分点解释,包括定义、特点、应用场景、代码示例(如果有的话)以及相关技术对比。同时,生成相关问题,引导用户进一步探索。例如,纤程与协程的区别,如何在C++中使用纤程等。 最后,检查是否符合系统级指令:使用正确的中文,数学公式用$...$,独立公式用$$...$$,但本问题可能不涉及数学公式。确保引用正确,引用标识放在相关段落末尾,如引用[^1]和[^2]。同时,生成的相关问题要自然,帮助用户深入理解。</think>### Fibers(纤程/光纤)的技术解析 在编程和技术领域中,"fibers" 主要有两种含义,具体取决于上下文: --- #### 1. **编程中的纤程(Fiber)** **定义**: 纤程是一种轻量级的执行单元,类似于线程,但完全由用户态管理,无需内核介入调度。它通过协作式多任务(非抢占式)实现并发,允许开发者更精细地控制任务切换[^1]。 **核心特点**: - **协作式调度**:纤程主动让出控制权(如通过`yield`),而非被系统强制中断。 - **轻量级**:一个线程可包含多个纤程,资源开销远低于线程(例如内存占用仅为几KB)。 - **无上下文切换**:切换由程序控制,避免了内核态与用户态切换的开销。 **应用场景**: - **高性能服务器**:如游戏服务器需处理大量并发连接(如MMO游戏的逻辑处理)。 - **实时系统**:需要确定性任务调度的场景(如工业控制软件)。 - **异步任务优化**:替代回调地狱,简化代码结构(如C#的`async/await`底层可能依赖纤程)。 **代码示例(C++ Boost.Fiber)**: ```cpp #include <boost/fiber/all.hpp> void task1() { std::cout << "Fiber 1\n"; boost::this_fiber::yield(); } int main() { boost::fibers::fiber f(task1); f.join(); } ``` **对比其他并发模型**: | 类型 | 调度方 | 抢占式 | 开销 | |-----------|------------|--------|------------| | 线程 | 操作系统 | 是 | 高(MB级) | | 协程 | 用户/语言 | 否 | 极低 | | **纤程** | **用户** | **否** | **低** | --- #### 2. **光纤(Optical Fiber)** **定义**: 光纤是一种利用光脉冲传输数据的物理介质,常用于高速网络通信(如引用[2]中提到的"Optical fibers directly connecting devices")。 **技术特性**: - **高带宽**:单模光纤速率可达$100\ \text{Gbps}$以上。 - **低延迟**:光速传播,延迟公式为$\Delta t = \frac{d}{c}$($d$为距离,$c$为光速)。 - **抗干扰**:不受电磁干扰,适用于长距离传输(如跨洋海底光缆)。 **应用场景**: - 数据中心内部连接(如AWS的Region间骨干网)。 - 5G基站回传网络。 - 金融行业高频交易系统的超低延迟链路。 --- ### 相关问题 1. 纤程与协程(Coroutine)有何区别? 2. 如何选择线程、协程或纤程进行并发编程? 3. 光纤通信中如何解决信号衰减问题? 4. C#中的`System.Threading.Thread`与纤程有何关联? [^1]: 数据结构课程中提到的并发编程优化方法。 : 光纤作为物理网络连接的典型应用。
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值