进程与线程的区别

最新推荐文章于 2024-12-01 17:30:06 发布

转载最新推荐文章于 2024-12-01 17:30:06 发布 · 676 阅读

引用Blog:

http://www.cnblogs.com/mindsbook/archive/2009/11/03/process_and_thread.html

进程和线程的定义

首先我们先要弄清楚二者的定义, 究竟什么是进程, 什么又是线程?

根据 wikipedia process 中的定义, 进程是一个计算机程序的实例,由一个或者多个线程组成.

同样在 wikipedia thread 中对线程的定义是: 线程的执行是由计算机的fork操作来将一个程序生成一个或者多个并发的运行任务.

在单核的计算机中, 线程并非 真正并行 的, 而是 分时的并发, 示意图如下:

也就是说多个线程无需等待另一个线程的完成,而只需要等待CPU的时间片.

在多核的计算机中,多个线程可以真正的并行,也就是同时执行,同时获得CPU时间片.

同样, 在现代计算机中, 通常都是 分时操作系统 (time sharing), 也就是不同的进程通过时间片来获得CPU的控制权, 来执行自己的代码. 同样,单核的系统进程也只能是并发的, 而多核的系统可以达到并行.

进程和线程的联系

二者的关系可以简单的一句话概括为, 通常, 一个进程可以包括多个线程, 一个线程只能属于一个进程.

一个进程可以生成多个线程,而这些线程之前共享地址空间和相应的资源, 在线程切换时, 并没有太多的开销.

http://farm3.static.flickr.com/2462/4070785749_4a001a75d2_o.png

图1: 进程的执行空间示意图

http://farm3.static.flickr.com/2436/4070785795_ff07b6930b_o.png

图2: 线程的执行空间示意图

从上面2个图可以很清楚地看到, 进程和线程在共享地址空间和资源的区别.

那么, 对于同样一个应用,我们可以选择进程来实现, 也可以选择线程来实现, 那么二者有什么区别呢? 我们应该如何选择呢?

进程和线程的区别

从本质上说,二者只是在 是否共享地址空间,及共享多少地址空间 上是有差别的,而至于其它的区别也都是因为这个本质区别来引起的. 下面逐一地进行简单的说明.

关于共享地址空间

传统意义上, 进程之间是不共享地址空间的, 而线程是共享着进程的地址空间.

但是在Linux中, 会有不同, 请参考下面 特定操作系统的进程和线程 部分的详细说明.

安全性

因为进程之前是不共享资源和地址空间的,所以不会存在太多的安全问题(相比于线程).

而由于多个线程共享着相同的地址空间和资源,所以会存在线程之间有可能会恶意修改或者获取非授权数据的可能.

这也就是为什么近期, chrome和IE8相继开始使用多进程来替代之前的多线程(不同的tab之间).

健壮性

由于多个线程共享同一个进程的地址空间和相关的资源, 所以当一个线程出现crash,那么可能会导致相应的地址空间和资源会出现问题,从而导致其它的线程也crash. 这个也很好理解,一个简单的大家可能都经历过的就是IE7吧, 当一个tab突然崩溃时,所有的tab都会崩溃,这时通常IE要重启(重启进程,重新生成线程).

而多进程则不存在这个问题, 因为不同的地址空间和资源, 当一个进程崩溃时, 并不会影响到其它进程. 同样,如果你用过chrome,如果一个tab崩溃了(chrome那搞笑的提示信息), 我们只需要关掉这个tab即可,并不会影响到其它的tab.

http://farm3.static.flickr.com/2570/4070876881_4c6fb0b501_o.png

图三: chrome崩溃时的截图

简单总结下, 原因:

这是Windows的设计的理念所致(多用户和并行的要求不高的特性)
在创建进程时,会有相当的系统调用

究竟有哪些额外的系统调用,请参考上面帖子.

Linux

让我们回到本文的摘要部分的引入, 我的室友提出的对于我而言 闻所未闻 的新观点.

那么, 在Linux下 进程和线程真的没有本质区别吗?

首先大家可以参考这个帖子, Threads vs Processes in Linux.

下面内容摘自 Threads vs Processes in Linux.

Linux uses a 1-1 threading model, with (to the kernel) no distinction between processes and threads
-- everything is simply a runnable task. *

On Linux, the system call clone clones a task, with a configurable level of sharing, among which are:

CLONE_FILES: share the same file descriptor table (instead of creating a copy)
CLONE_PARENT: don't set up a parent-child relationship between the new task and the old
(otherwise, child's getppid() = parent's getpid())
CLONE_VM: share the same memory space (instead of creating a COW copy)
fork() calls clone(least sharing) and pthread_create() calls clone(most sharing). **

forking costs a tiny bit more than pthread_createing because of copying tables and creating COW mappings for memory,
but the Linux kernel developers have tried (and succeeded) at minimizing those costs.

Switching between tasks, if they share the same memory space and various tables, will be a tiny bit cheaper
than if they aren't shared, because the data may already be loaded in cache. However,
switching tasks is still very fast even if nothing is shared -- this is something else that Linux kernel developers
try to ensure (and succeed at ensuring).

In fact, if you are on a multi-processor system, not sharing may actually be a performance boon:
if each task is running on a different processor, synchronizing shared memory is expensive.

上面其实已经讲得很清楚了,