Killing Subprocesses in Linux/Bash

本文探讨了在Linux环境中如何有效管理和终止Bash脚本及其派生的所有子进程的方法。文章详细介绍了使用陷阱(traps)、顶级陷阱、kill命令、exec指令及Python的psutil库等技巧,并解释了如何避免孤儿进程的产生。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

转自:http://riccomini.name/posts/linux/2012-09-25-kill-subprocesses-linux-bash/


A common requirement when writing Bash scripts in Linux is to kill all a process and all of its child that were spawned. This tutorial describes various methods to prevent orphaned subprocesses.

Lately, I’ve been working with YARN at LinkedIn. This framework allows you to execute Bash scripts on one or more machines. It’s used primarily for Hadoop. When usingYARN, you often end up with nested Bash scripts with no parent process ID (PPID) when the NodeManager launches the Bash script. This can be pretty problematic when the NodeManager is shut down, since you must make sure to clean up all child subprocesses via your parent Bash script.

Understanding Linux Subprocesses

Let’s start with an example. We’ll have two shell scripts: a parent, and a child:

$ cat parent.sh 
#!/bin/bash
./child.sh
$ cat child.sh 
#!/bin/bash
sleep 1000

Normally, when you launch nested processes from a terminal, you’ll see a process tree that looks something like this:

UID        PID  PPID  C STIME  TTY         TIME CMD
ubuntu   10911 10701  0 05:07 pts/1    00:00:00 /bin/bash ./parent.sh
ubuntu   10912 10911  0 05:07 pts/1    00:00:00 /bin/bash ./child.sh
ubuntu   10913 10912  0 05:07 pts/1    00:00:00 sleep 1000

In this example, a terminal (PID 10701) calls parent.sh, which calls child.sh, which calls sleep 1000. WithYARN, you end up with a process tree that looks more like this:

UID        PID  PPID  C STIME  TTY         TIME CMD
ubuntu   10966     1  0 05:14 pts/1    00:00:00 /bin/bash ./parent.sh
ubuntu   10967 10966  0 05:14 pts/1    00:00:00 /bin/bash ./child.sh
ubuntu   10968 10967  0 05:14 pts/1    00:00:00 sleep 1000

Notice that the PPID of parent.sh is now 1. This is essentially a top-level process that has no parent.

Unexpected Behavior

In both of these examples, it seems intuitive that killing the top level parent would result in all of the children being cleaned up. There are anumber of ways to kill a process, so let’s start with:

$ kill -9 10966
UID        PID  PPID  C STIME  TTY         TIME CMD
ubuntu   10966     1  0 05:14 pts/1    00:00:00 /bin/bash ./parent.sh
ubuntu   10967 10966  0 05:14 pts/1    00:00:00 /bin/bash ./child.sh
ubuntu   10968 10967  0 05:14 pts/1    00:00:00 sleep 1000

As expected, killing the parent does not clean up any children:

UID        PID  PPID  C STIME  TTY         TIME CMD
ubuntu   10967     1  0 05:14 pts/1    00:00:00 /bin/bash ./child.sh
ubuntu   10968 10967  0 05:14 pts/1    00:00:00 sleep 1000

Let’s try sending a kill signal that’s not quite as strong as kill -9. For a list of possible signals, try running:

$ kill -l
 1) SIGHUP       2) SIGINT       3) SIGQUIT      4) SIGILL       5) SIGTRAP
 6) SIGABRT      7) SIGBUS       8) SIGFPE       9) SIGKILL     10) SIGUSR1
11) SIGSEGV     12) SIGUSR2     13) SIGPIPE     14) SIGALRM     15) SIGTERM
16) SIGSTKFLT   17) SIGCHLD     18) SIGCONT     19) SIGSTOP     20) SIGTSTP
21) SIGTTIN     22) SIGTTOU     23) SIGURG      24) SIGXCPU     25) SIGXFSZ
26) SIGVTALRM   27) SIGPROF     28) SIGWINCH    29) SIGIO       30) SIGPWR
31) SIGSYS      34) SIGRTMIN    35) SIGRTMIN+1  36) SIGRTMIN+2  37) SIGRTMIN+3
38) SIGRTMIN+4  39) SIGRTMIN+5  40) SIGRTMIN+6  41) SIGRTMIN+7  42) SIGRTMIN+8
43) SIGRTMIN+9  44) SIGRTMIN+10 45) SIGRTMIN+11 46) SIGRTMIN+12 47) SIGRTMIN+13
48) SIGRTMIN+14 49) SIGRTMIN+15 50) SIGRTMAX-14 51) SIGRTMAX-13 52) SIGRTMAX-12
53) SIGRTMAX-11 54) SIGRTMAX-10 55) SIGRTMAX-9  56) SIGRTMAX-8  57) SIGRTMAX-7
58) SIGRTMAX-6  59) SIGRTMAX-5  60) SIGRTMAX-4  61) SIGRTMAX-3  62) SIGRTMAX-2
63) SIGRTMAX-1  64) SIGRTMAX

Now, let’s try this again with a normal SIGHUP kill. One might expect that sending such a soft kill signal should result in the child processes being cleaned up.

$ kill -SIGHUP 10967
UID        PID  PPID  C STIME  TTY         TIME CMD
ubuntu   10968     1  0 05:14 pts/1    00:00:00 sleep 1000

As you can see, even SIGHUP does not kill the child processes; it leaves the sleep call orphaned with aPPID of 1.

So, how can we do this properly?

Traps

One solution is to use traps in the Bash script. A trap is a way to say “do this before exiting” in a Bash script. For example, we might add the following line to parent.sh and child.sh:

trap 'kill $(jobs -p)' EXIT

Now, if we kill the parent, all children will be cleaned up! Obviously, this only works with softer kill signals, such asSIGHUP. For example, if we have this process tree:

UID        PID  PPID  C STIME  TTY         TIME CMD
ubuntu   11049 10758  0 05:31 pts/2    00:00:00 /bin/bash ./parent.sh
ubuntu   11050 11049  0 05:31 pts/2    00:00:00 /bin/bash ./child.sh
ubuntu   11051 11050  0 05:31 pts/2    00:00:00 sleep 1000

You can execute:

$ kill 11049
$ ps -ef | grep sleep

And you will see that sleep is no longer running!

Top-Level Trap

A variation of having a trap in each Bash file is to have a single top-level trap that uses ‘ps’ to find children:

12 3 4 5 6 7 8 9 10 11 12 13 14 15
         
kill_child_processes() {
isTopmost=$1
curPid=$2
childPids=`ps -o pid --no-headers --ppid ${curPid}`
for childPid in $childPids
do
kill_child_processes 0 $childPid
done
if [ $isTopmost -eq 0 ]; then
kill -9 $curPid 2> /dev/null
fi
}
 
# Ctrl-C trap. Catches INT signal
trap "kill_child_processes 1 $$; exit 0" INT
view raw gistfile1.txt hosted with ❤ by GitHub

This is a less than ideal solution, but it does work. For details, see this page.

Kill PPIDs

Running traps everywhere can be kind of clunky, and error prone. A cleaner approach is to use the kill command, and provide a parent process ID (PPID) instead of a process ID. To do this, the syntax gets funky. You use a negative of the parent process ID, like so:

kill -- -<PPID>
其实这里应该指定的不是父进程id 应该为进程组id 也就是进程组的首进程的id

For example, with this process tree:

UID        PID  PPID  C STIME  TTY         TIME CMD
ubuntu   11096     1  0 05:36 ?        00:00:00 /bin/bash ./parent.sh
ubuntu   11097 11096  0 05:36 ?        00:00:00 /bin/bash ./child.sh
ubuntu   11098 11097  0 05:36 ?        00:00:00 sleep 1000

You would run:

kill -- -11096
ps -ef | grep sleep

As you can see, killing with a PPID automatically cleans all subprocesses, including nested subprocesses!

exec

Another handy trick is to use exec when nesting Bash calls. Exec replaces the “current” process with the “child” process. This doesn’t always work, but for our example (parent, child, sleep), it certainly does. Let’s make parent and child look like this, respectively:

$ cat parent.sh
#!/bin/bash
exec ./child.sh
$ cat child.sh
#!/bin/bash
exec sleep 1000

Notice the “exec” command preceding the child.sh and sleep calls. Let’s have a look at the process tree:

$ ps -ef | grep parent
$ ps -ef | grep child
$ ps -ef | grep sleep
ubuntu   11155 10758  0 05:41 pts/2    00:00:00 sleep 1000

As you can see, only a ‘sleep’ process exists. The parent.sh script “becomes” child.sh, and child.sh “becomes” sleep. This makes it very easy to clean up child processes, because there are none! To clean up, you simply kill the ‘sleep’ process. This is the method that I use with YARN, since I’m executing nested Bash calls that lead to a single Java process.

Python

If you’re not strictly tied to Bash, you might be interested in Python’s psutil library. It can be used to kill all subprocess for a given process ID.

setsid

One other minor note. You might be wondering how you end up with a PPID of 1. Obviously, kill -9’ing will do it. You can also use a command calledsetsid. This is whatYARN does when its NodeManager executes a child process. To try and execute parent.sh with aPPID of 1, execute:

setsid ./parent.sh

For further reading, check the nohup wiki, which can be used as an alternative to setsid.


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值