fork vs. posix_spawn Sun

本文探讨了在Linux环境下,为了避免内存过度承诺问题,使用posix_spawn替代传统fork、execve及dup2方法执行外部进程的方式。文章通过具体示例对比了两种方法在不同内存分配策略下的表现,并展示了如何利用posix_spawn实现进程间通信。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

It seems like in unix-like systems, the default way of executing another process is using fork(2) and execve(2), and if you want to redirect the input and output, using dup2(2). As far as I read, fork(2) creates a virtual copy of the process image, that is, it mapps the pages of the original process in a copy-on-write mode.

Now, the man page of malloc(3) sais

By default, Linux follows an optimistic memory allocation strategy.  This means that when malloc() returns non-NULL there is no guarantee that the memory really is available.  This is a really bad bug.  In case it turns out that the system is out of memory,  one  or  more  processes  will be killed by the infamous OOM killer.  In case Linux is employed under circumstances where it would be less desirable to suddenly lose some randomly picked processes, and moreover the kernel version is sufficiently recent, one can switch off this overcommitting behavior using a command like:

# echo 2 > /proc/sys/vm/overcommit_memory


This behavior seems to also affect fork(2): When forking a huge process, even though there is no need to duplicate any pages, it will not be allowed, because the kernel must not overcommit. However, I do not really like the idea of having an OOM killer that randomly kills processes if there happens to be less space then necessary. So if one writes software, normally one should have only a small process that forks. On the other hand, sometimes one just wants to execute a little external executable, which is why one probably does not want two processes per default, and fork(2), execve(2) and dup2(2) are a bad choice since they need to have about twice as much available memory as the forking process.

I have heard of this problem the first time when somebody tried to use Runtime.exec in Java, and it failed, because he turned off overcommitting, and the jvm apparently used fork(2) internally - hence, the created process is as large as the whole jvm. Googling around a bit, I found that this problem is known and will probably be changed in further jvm versions.

I wondered about the alternatives, and there seems to be an alternative, called posix_spawn(p), which executes an external command without forking.

The problem is that posix_spawn(p) is not a linux-syscall, and in theory, it could be implemented via fork(2), since POSIX specifies interfaces rather than implementations. So I tried a little test. By

# echo 0 > /proc/sys/vm/overcommit_memory

I can ensure that I have the default behaviour of Linux. The following code works:

#include <malloc.h>
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>

int main (void) {
  volatile void *bla = malloc (1024*1024*1024);
  int f = fork ();
  printf("Pid: %d\nFork: %d\n---\n", getpid(), f);
  scanf("scanf");
}


and has an output like:

Pid: 28875
Fork: 28876
---
Pid: 28876
Fork: 0
---


Now when I turn off overcommiting by

# echo 2 > /proc/sys/vm/overcommit_memory

and run the same again, it fails:

Pid: 29014
Fork: -1
---

fork(2) returns -1, which means that it was not successful, as expected. Of course, therefore, I cannot execute an external process, because one of both processes would have to run execve(2). Now, I have written code that creates a new process using posix_spawn(p). It runs sleep(1), and waits for it. If I tried to do this using fork(2), execve(2) and dup2(2), I would run into exactly that problem. However, the following code works without overcommiting:

#include <malloc.h>
#include <unistd.h>
#include <spawn.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <string.h>

int main (void) {
  volatile void *bla = malloc (1024*1024*1024);
  int status;
  int pid;

  char* spawnedArgs[] = { "/bin/sleep", "2d", NULL };
 
  int f = posix_spawnp(&pid, spawnedArgs[0], NULL, NULL, spawnedArgs, NULL);

  printf("Pid: %d\nposix_spawn: %d\npid: %d\n---\n", getpid(), f, pid);
  scanf("scanf");

  wait(&status);
}

and produces an output like

Pid: 29075
posix_spawn: 0
pid: 29076
---


Nice. However, when it comes to controlling the input and output of the created process, dup2(2) is not enough. One must use posix_spawn_file_actions_t for that. The following code was created by me and Matthias Benkard, and does exactly this.

UPDATE: Removed a bug found by the commentor "dothebart", thank you!

The data structure saves actions that need to be done on the file handles. I dont want to give a deeper introduction into the functions used, they have good manpages, but I think a working example is a good starting point for understanding them anyway.

#include <spawn.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>

int main(int argc, char **argv) {
  int out[2];
  int in[2];
  int pid;
  posix_spawn_file_actions_t action;
  char* spawnedArgs[] = { "/bin/cat", NULL };
  int status;
  char r[1024];
  bzero(r, 1024);

  pipe(out);
  pipe(in);

  posix_spawn_file_actions_init(&action);
  posix_spawn_file_actions_adddup2(&action, out[0], 0);
  posix_spawn_file_actions_addclose(&action, out[1]);

  posix_spawn_file_actions_adddup2(&action, in[1], 1);
  posix_spawn_file_actions_addclose(&action, in[0]);
 
  posix_spawnp(&pid, spawnedArgs[0], &action, NULL, spawnedArgs, NULL);

  close(out[0]);
  close(in[1]);

  write(out[1], "Hallo!", strlen("Hallo!"));
  close(out[1]);


  read(in[0], r, 1024);
  printf("Read data: \"%s\"\n", r);

  wait(&status);
  posix_spawn_file_actions_destroy(&action);

  return EXIT_SUCCESS;
}


The function calls "/bin/cat", which reads from the stdin and writes to the stdout. We are taking these streams and send "Hallo!" to the input, and read the output. And in fact, its output is

Read data: "Hallo!"

Of course, if I had more data to send, this program could run into a deadlock, since the write-call would block if the buffers are full. This can be solved in the usual ways, by asynchronous I/O or by multithreading. But for this example, it is better to use this simpler method.
<think>好的,我现在需要帮助用户了解posix_spawn的用法。首先,我要回忆一下posix_spawn函数的基本概念,它是POSIX标准中用于创建新进程的函数,相比于传统的fork和exec组合,posix_spawn在某些情况下更高效,尤其是在资源受限的环境中,因为它可能避免复制父进程的整个地址空间。 接下来,我需要确定用户的需求。用户可能希望了解如何在C语言中使用posix_spawn,包括函数原型、参数说明以及示例代码。因此,我应该先解释函数的基本结构,然后逐步分解每个参数的作用,最后给出一个具体的例子。 首先,函数原型是: ```c int posix_spawn(pid_t *pid, const char *path, const posix_spawn_file_actions_t *file_actions, const posix_spawnattr_t *attrs, char *const argv[], char *const envp[]); ``` 需要逐个参数解释: - pid:指向新进程ID的指针。 - path:可执行文件的路径。 - file_actions:文件操作,比如重定向输入输出。 - attrs:属性,比如设置信号处理或进程调度参数。 - argv:命令行参数数组。 - envp:环境变量数组。 然后,需要说明返回值,成功返回0,失败返回错误码而非设置errno。 接下来,用户可能需要一个简单的例子。比如使用posix_spawn执行ls命令。需要包含必要的头文件,比如<stdio.h>, <spawn.h>, <sys/wait.h>。然后展示如何初始化file_actions和attrs,通常如果不需要特殊操作,可以传NULL。 然后编写示例代码,调用posix_spawn,检查返回值,等待子进程结束,并处理可能的错误。 需要注意,可能有些系统需要链接特定的库,比如在编译时加上-lrt或者其他的,但通常可能不需要。不过用户如果在编译时遇到链接错误,可能需要添加-lutil或其他选项,这取决于系统。 还要提醒用户关于参数数组argv的构造,必须以NULL结尾,环境变量envp如果为NULL则使用当前环境。 可能还需要对比posix_spawnfork/exec的区别,说明posix_spawn的优势,比如更高效,尤其是在大内存应用中,避免复制父进程内存。 另外,要提到文件操作和属性的高级用法,比如重定向子进程的标准输出到文件,但这可能需要使用posix_spawn_file_actions_addopen等函数,这可能在进阶部分,但用户问题可能只需要基础用法。 最后,确保示例代码正确,没有语法错误,并且注释清晰,用户能够复制粘贴并运行测试。</think>POSIX 标准中的 `posix_spawn` 函数用于高效创建新进程,适合替代传统的 `fork` + `exec` 组合。以下是逐步说明和用法示例: --- ### 一、函数原型 ```c #include <spawn.h> int posix_spawn( pid_t *pid, // 存储子进程PID const char *path, // 可执行文件路径 const posix_spawn_file_actions_t *file_actions, // 文件操作(如重定向) const posix_spawnattr_t *attrs, // 进程属性(如信号、调度) char *const argv[], // 命令行参数数组 char *const envp[] // 环境变量数组 ); ``` - **返回值**: 成功返回 `0`, 失败返回错误码(非负整数) --- ### 二、参数详解 1. **`pid`** 输出参数,用于存储新进程的 PID。 2. **`path`** 要执行的可执行文件的完整路径,例如 `/bin/ls`。 3. **`file_actions`** 定义子进程的文件操作(如重定向输入/输出)。若无需操作,传 `NULL`。 4. **`attrs`** 设置进程属性(如信号掩码、调度策略)。若无需特殊属性,传 `NULL`。 5. **`argv`** 命令行参数数组,格式与 `main` 函数的 `argv` 相同,**必须以 `NULL` 结尾**。 例如:`{"ls", "-l", NULL}`。 6. **`envp`** 环境变量数组,格式为 `"VAR=value"`。若传 `NULL`,则继承当前环境。 --- ### 三、基础用法示例 #### 示例1:执行 `ls -l` 命令 ```c #include <stdio.h> #include <spawn.h> #include <sys/wait.h> int main() { pid_t pid; char *argv[] = {"ls", "-l", NULL}; // 调用 posix_spawn int ret = posix_spawn(&pid, "/bin/ls", NULL, NULL, argv, NULL); if (ret != 0) { perror("posix_spawn failed"); return 1; } // 等待子进程结束 int status; waitpid(pid, &status, 0); return 0; } ``` #### 编译命令 ```bash gcc example.c -o example ``` --- ### 四、高级用法 #### 1. 文件重定向 通过 `file_actions` 重定向子进程的标准输出到文件: ```c #include <fcntl.h> posix_spawn_file_actions_t file_actions; posix_spawn_file_actions_init(&file_actions); // 将子进程的 stdout 重定向到 output.txt posix_spawn_file_actions_addopen(&file_actions, STDOUT_FILENO, "output.txt", O_WRONLY | O_CREAT, 0644); // 调用 posix_spawn 时传入 file_actions posix_spawn(&pid, "/bin/ls", &file_actions, NULL, argv, NULL); // 清理资源 posix_spawn_file_actions_destroy(&file_actions); ``` #### 2. 设置进程属性 通过 `attrs` 修改子进程的信号掩码: ```c posix_spawnattr_t attrs; posix_spawnattr_init(&attrs); // 设置子进程忽略所有信号 sigset_t sigmask; sigfillset(&sigmask); posix_spawnattr_setsigmask(&attrs, &sigmask); // 调用 posix_spawn 时传入 attrs posix_spawn(&pid, "/bin/ls", NULL, &attrs, argv, NULL); // 清理资源 posix_spawnattr_destroy(&attrs); ``` --- ### 五、错误处理 检查返回值并打印错误信息: ```c int ret = posix_spawn(...); if (ret != 0) { fprintf(stderr, "Error code: %d, Message: %s\n", ret, strerror(ret)); } ``` --- ### 六、与 fork/exec 的对比 | 特性 | posix_spawn | fork + exec | |---------------------|---------------------------|---------------------------| | 性能 | 更高(避免复制地址空间) | 较低(需复制父进程内存) | | 适用场景 | 资源受限环境 | 需要复杂父子交互的场景 | | 控制粒度 | 通过参数直接配置 | 需手动处理细节 | --- ### 七、注意事项 1. **路径问题**:`path` 必须是绝对路径或相对于当前工作目录的有效路径。 2. **参数数组**:`argv` 和 `envp` 必须以 `NULL` 结尾。 3. **可移植性**:某些旧系统可能不支持此函数,需检查 `man posix_spawn`。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值