kill -3 pid打印trace原理笔记

在系统稳定性分析中,`kill -3 pid`常用于获取System_Server进程的Java调用栈以定位问题。该命令依赖于每个Zygote孵化进程中的"Signal Catcher"线程来处理SIGQUIT信号。如果进程无法打印trace,可能是因为缺少此线程。本文深入探讨了`kill -3 pid`的工作原理,包括Signal Catcher线程的启动、处理流程,并指出SIGUSR1信号(kill -10 pid)用于强制执行GC。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

    做系统稳定性问题分析,当遇到系统卡死时,我们经常要使用“kill -3 pid”来打印System_Server进程各个线程的Java调用栈,根据线程状态及调用栈来更进一步定位问题点,当然某个应该界面卡顿时间长时也可以通过这个命令来抓取Java调用栈进行分析。注意native进程是不能用kill -3来打trace的,而是使用debuggerd。但是某些时候打印不出来trace,要知道原因,自然要知道“kill -3 pid”原理是怎么样的。

    “Signal Catcher”线程。由Zygote孵化出来的每个进程会启动一个“Signal Catcher”线程,这个线程就是专门用来接收、处理进程收到的SIGQUIT、SIGUSR1信号的。注意,Zygote进程是不存在“Signal Catcher”线程的,所以是打不出来trace的。利用“ps -t pid”可打印进程pid的所有线程,可以看到有一个“Signal Catcher”线程。

   

    “Signal Catcher”线程启动。启动流程很简单,如下图所示,可根据下面这个流程自行走一遍代码(基于Android 5.1)。


    上面这个时序图中,主要逻辑集中在art/runtime/Signal_catcher.cc文件中,下面将具体分析时序图中的run()、HandleSigQuit()、Output()三个函数。

    1、run()

void* SignalCatcher::Run(void* arg) {
  SignalCatcher* signal_catcher = reinterpret_cast<SignalCatcher*>(arg);
  CHECK(signal_catcher != NULL);

  Runtime* runtime = Runtime::Current();
  CHECK(runtime->AttachCurrentThread("Signal Catcher", true, runtime->GetSystemThreadGroup(),
                                     !runtime->IsCompiler()));     //将线程名更改为“Signal Catcher”,该函数更详细的解释见:http://www.th7.cn/Program/java/201405/195472.shtml
  Thread* self = Thread::Current();
  DCHECK_NE(self->GetState(), kRunnable);
  {
    MutexLock mu(self, signal_catcher->lock_);
    signal_catcher->thread_ = self;
    signal_catcher->cond_.Broadcast(self);             
  }

  // Set up mask with signals we want to handle.
  SignalSet signals;
  signals.Add(SIGQUIT);     //添加接收的信号包括SIGQUIT、SIGUSR1。SIGQUIT毫无疑问是打印trace的,SIGUSR1(-10)是触发强制GC。
  signals.Add(SIGUSR1);

  while (true) {
    int signal_number = signal_catcher->WaitForSignal(self, signals);      //等待SIGQUIT、SIGUSR1信号来临,信号来了后该调用返回,否则阻塞在该调用上(WaitForSignal()函数里面实际上是调用了SignalSet.Wait(),具体实现在art/runtime/Signal_set.h文件中,SignalSet.wait()函数调用了sigwait()这个系统调用来阻塞接收SIGQUIT、SIGUSR1信号);

    if (signal_catcher->ShouldHalt()) {          //如果SignalCatcher对象已经调了析构函数,那么直接调用DetachCurrentThread(),正常情况下该条件不满足;
      runtime->DetachCurrentThread();
      return NULL;
    }

    switch (signal_number) {
    case SIGQUIT:                    //kill -3 pid,调用HandleSigQuit(),打印所有线程的调用栈;
      signal_catcher->HandleSigQuit();
      break;
    case SIGUSR1:                    //kill -10 pid,调用HandleSigUsr1(),触发强制GC;
      signal_catcher->HandleSigUsr1();
      break;
    default:
      LOG(ERROR) << "Unexpected signal %d" << signal_number;
      break;
    }
  }
}
    2、HandleSigQuit()

void SignalCatcher::HandleSigQuit() {
  Runtime* runtime = Runtime::Current();
  ThreadList* thread_list = runtime->GetThreadList();      //获取所有的线程;

  // Grab exclusively the mutator lock, set state to Runnable without checking for a pending
  // suspend request as we're going to suspend soon anyway. We set the state to Runnable to avoid
  // giving away the mutator lock.
  thread_list->SuspendAll();                //挂起所有的线程。上面那段注释的意识是:如果某个线程持有某个锁并在runnable状态,那么并不真的去挂起这个线程,所以我们会在trace中见到runnable的线程?
  Thread* self = Thread::Current();
  Locks::mutator_lock_->AssertExclusiveHeld(self);
  const char* old_cause = self->StartAssertNoThreadSuspension("Handling SIGQUIT");
  ThreadState old_state = self->SetStateUnsafe(kRunnable);

  std::ostringstream os;                  //定义一个字符串流,用来包装、格式化输出内容;
  os << "\n"
      << "----- pid " << getpid() << " at " << GetIsoDate() << " -----\n";

  DumpCmdLine(os);                      //打印cmdline中的内容;

  // Note: The string "ABI:" is chosen to match the format used by debuggerd.
  os << "ABI: " << GetInstructionSetString(runtime->GetInstructionSet()) << "\n";

  os << "Build type: " << (kIsDebugBuild ? "debug" : "optimized") << "\n";

  runtime->DumpForSigQuit(os);        

  if (false) {
    std::string maps;
    if (ReadFileToString("/proc/self/maps", &maps)) {
      os << "/proc/self/maps:\n" << maps;
    }
  }
  os << "----- end " << getpid() << " -----\n";        //trace结束标志;
  CHECK_EQ(self->SetStateUnsafe(old_state), kRunnable);
  self->EndAssertNoThreadSuspension(old_cause);
  thread_list->ResumeAll();                          //resume所有挂起的线程;
  // Run the checkpoints after resuming the threads to prevent deadlocks if the checkpoint function
  // acquires the mutator lock.
  if (self->ReadFlag(kCheckpointRequest)) {
    self->RunCheckpointFunction();
  }
  Output(os.str());         //调用Output()将字符串流中的内容写到traces.txt中;
}

   3、Output()

void SignalCatcher::Output(const std::string& s) {
  if (stack_trace_file_.empty()) {
    LOG(INFO) << s;
    return;
  }

  ScopedThreadStateChange tsc(Thread::Current(), kWaitingForSignalCatcherOutput);
  int fd = open(stack_trace_file_.c_str(), O_APPEND | O_CREAT | O_WRONLY, 0666);        //以追加、创建、可写方式打开/data/anr/traces.txt
  if (fd == -1) {
    PLOG(ERROR) << "Unable to open stack trace file '" << stack_trace_file_ << "'";
    return;
  }
  std::unique_ptr<File> file(new File(fd, stack_trace_file_));
  if (!file->WriteFully(s.data(), s.size())) {                                          //将字符串流写入/data/anr/traces.txt中
    PLOG(ERROR) << "Failed to write stack traces to '" << stack_trace_file_ << "'";
  } else {
    LOG(INFO) << "Wrote stack traces to '" << stack_trace_file_ << "'";
  }
}

    总结:熟悉了这个流程,以后碰到打不出来trace,通过日志可大致定位问题点。最后再说一下SIGQUIT、SIGUSR1信号处理,SIGQUIT(kill -3 pid)用来打印Java进程trace,SIGUSR1(kill -10 pid)可触发进程进行一次强制GC。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值