系统死机重启调试分析

本文分析了一起由Watchdog超时导致的Android系统重启问题,涉及Kernel 3.10.0和Android 6.0。在RK3368平台上,系统在16分钟后因BinderThreadMonitor阻塞而重启,主要发生在调用WifiManager.enableNetwork(-1)时。调试过程中,发现多个Binder线程等待,导致无法处理新消息。通过代码分析,问题出在CMD_ENABLE_NETWORK命令处理时未正确响应AsyncChannel,提出了解决方案。

Platform: RK3368

OS: Android 6.0

Kernel: 3.10.0


现象

logcat Watchdog系统重启日志:

    10-13 17:17:23.106   571   614 W IPCThreadState: Waiting for thread to be free. mExecutingThreadsCount=15 mMaxThreads=15
    10-13 17:17:34.195   571   922 I Process : Sending signal. PID: 571 SIG: 3
    10-13 17:17:34.196   571   576 I art     : Thread[2,tid=576,WaitingInMainSignalCatcherLoop,Thread*=0x55840f6430,peer=0x12c010a0,"Signal Catcher"]: reacting to signal 3
    10-13 17:17:34.875   571   625 E SensorsHal: poll() failed (Interrupted system call)
    10-13 17:17:36.779   571   656 D WifiConfigStore: Retrieve network priorities after PNO.
    10-13 17:17:36.781   571   576 I art     : Wrote stack traces to '/data/anr/traces.txt'
    10-13 17:17:40.746   571   922 I Watchdog_N: dumpKernelStacks
    10-13 17:17:40.859   571   922 W Watchdog: *** WATCHDOG KILLING SYSTEM PROCESS: Blocked in monitor com.android.server.Watchdog$BinderThreadMonitor on foreground thread (android.fg)
    10-13 17:17:40.859   571   922 W Watchdog: foreground thread stack trace:
    10-13 17:17:40.860   571   922 W Watchdog:     at android.os.Binder.blockUntilThreadAvailable(Native Method)
    10-13 17:17:40.860   571   922 W Watchdog:     at com.android.server.Watchdog$BinderThreadMonitor.monitor(Watchdog.java:200)
    10-13 17:17:40.860   571   922 W Watchdog:     at com.android.server.Watchdog$HandlerChecker.run(Watchdog.java:172)
    10-13 17:17:40.860   571   922 W Watchdog:     at android.os.Handler.handleCallback(Handler.java:739)
    10-13 17:17:40.860   571   922 W Watchdog:     at android.os.Handler.dispatchMessage(Handler.java:95)
    10-13 17:17:40.860   571   922 W Watchdog:     at android.os.Looper.loop(Looper.java:148)
    10-13 17:17:40.860   571   922 W Watchdog:     at android.os.HandlerThread.run(HandlerThread.java:61)
    10-13 17:17:40.860   571   922 W Watchdog:     at com.android.server.ServiceThread.run(ServiceThread.java:46)
    10-13 17:17:40.860   571   922 W Watchdog: *** GOODBYE!
    10-13 17:17:40.860   571   922 I Process : Sending signal. PID: 571 SIG: 9
    10-13 17:17:40.977   199   199 E installd: eof
    10-13 17:17:40.977   199   199 E installd: failed to read size
    10-13 17:17:40.977   199   199 I installd: closing connection
    10-13 17:17:40.978   188   188 I lowmemorykiller: ActivityManager disconnected
    10-13 17:17:40.978   188   188 I lowmemorykiller: Closing Activity Manager data connection
    10-13 17:17:40.988   189   189 I ServiceManager: service 'lock_settings' died
    10-13 17:17:40.988   189   189 I ServiceManager: service 'network_management' died
    10-13 17:17:40.988   189   189 I ServiceManager: service 'clipboard' died
    10-13 17:17:40.988   189   189 I ServiceManager: service 'batterystats' died

kernel 日志:

    [ 1012.641269]  [3:       watchdog:  922] SysRq : Show Blocked State
    [ 1012.641365]  [3:       watchdog:  922]   task                        PC stack   pid father
    [ 1012.641535]  [3:       watchdog:  922] rknand_bg       D ffffffc0000856c0     0   153      2 0x00000000
    [ 1012.641570]  [3:       watchdog:  922] Call trace:
    [ 1012.641613]  [3:       watchdog:  922] [<ffffffc0000856c0>] __switch_to+0x80/0x8c
    [ 1012.641650]  [3:       watchdog:  922] [<ffffffc0008347dc>] __schedule+0x400/0x608
    [ 1012.641682]  [3:       watchdog:  922] [<ffffffc000834a48>] schedule+0x64/0x70
    [ 1012.641714]  [3:       watchdog:  922] [<ffffffc000832e34>] schedule_timeout+0x1cc/0x1fc
    [ 1012.641770]  [3:       watchdog:  922] [<ffffffbffc000ca0>] nand_blktrans_bg_thread+0x28c/0x2fc [rk30xxnand_ko]
    [ 1012.641807]  [3:       watchdog:  922] [<ffffffc0000c8024>] kthread+0xac/0xb8
    [ 1012.641835]  [3:       watchdog:  922] rknand          D ffffffc0000856c0     0   154      2 0x00000000
    [ 1012.641867]  [3:       watchdog:  922] Call trace:
    [ 1012.641897]  [3:       watchdog:  922] [<ffffffc0000856c0>] __switch_to+0x80/0x8c
    [ 1012.641928]  [3:       watchdog:  922] [<ffffffc0008347dc>] __schedule+0x400/0x608
    [ 1012.641959]  [3:       watchdog:  922] [<ffffffc000834a48>] schedule+0x64/0x70
    [ 1012.641989]  [3:       watchdog:  922] [<ffffffc000832e34>] schedule_timeout+0x1cc/0x1fc
    [ 1012.642041]  [3:       watchdog:  922] [<ffffffbffc001854>] nand_blktrans_thread+0x1a8/0x61c [rk30xxnand_ko]
    [ 1012.642074]  [3:       watchdog:  922] [<ffffffc0000c8024>] kthread+0xac/0xb8
    [ 1012.642632]  [3:       watchdog:  922] Sched Debug Version: v0.10, 3.10.0 #441
    ...
    [ 1012.836914]  [1:  kworker/u16:0:    6] binder: release 571:604 transaction 20644 in, still active
    [ 1012.837026]  [1:  kworker/u16:0:    6] binder: send failed reply for transaction 20644 to 711:1430
    [ 1012.837147]  [1:  kworker/u16:0:    6] binder: release 571:605 transaction 38506 in, still activell active

调试分析

看日志只能知道是MonitorChecker超时了,BinderThreadMonitor阻塞导致的。此时只看logcat日志是无法定位到更深层次的原因,就要看traces文件日志了;

触发ANR,dump trace文件的场景如下:

  • Service Timeout:比如前台服务在20s内未执行完成
  • BroadcastQueue Timeout:比如前台广播在10s内未执行完成
  • ContentProvider Timeout:内容提供者,在publish过超时10s
  • InputDispatching Timeout: 输入事件分发超时5s,包括按键和触摸事件

其实还有一种情况也会生成trace文件,那就是触发Framework的Watchdog的时候。

ANR traces 文件默认是由SystemProperties dalvik.vm.stack-trace-file指定的:
dalvik.vm.stack-trace-file = /data/anr/traces.txt

根据日志10-13 17:17:36.781 571 576 I art : Wrote stack traces to ‘/data/anr/traces.txt’,我们查找trace文件对应时间段的日志,发现多个binder在Native Waiting阻塞,是APP在调用WifiManager.enableNetwork,达到32个:

        "Binder_1" prio=5 tid=8 Waiting
          | group="main" sCount=1 dsCount=0 obj=0x12ca00a0 self=0x55841284e0
          | sysTid=604 nice=0 cgrp=default sched=0/0 handle=0x7f829f4450
          | state=S schedstat=( 0 0 0 ) utm=33 stm=17 core=0 HZ=100
          | stack=0x7f828f8000-0x7f828fa000 stackSize=1013KB
          | held mutexes=
          at java.lang.Object.wait!(Native method)
          - waiting on <0x091b4b3e> (a java.lang.Object)
          at com.android.internal.util.AsyncChannel$SyncMessenger.sendMessageSynchronously(AsyncChannel.java:820)
          - locked <0x091b4b3e> (a java.lang.Object)
          at com.android.internal.util.AsyncChannel$SyncMessenger.-wrap0(AsyncChannel.java:-1)
          at com.android.internal.util.AsyncChannel.sendMessageSynchronously(AsyncChannel.java:654)
          at com.android.internal.util.AsyncChannel.sendMessageSynchronously(AsyncChannel.java:699)
          at com.android.server.wifi.WifiStateMachine.syncEnableNetwork(WifiStateMachine.java:2328)
          at com.android.server.wifi.WifiServiceImpl.enableNetwork(WifiServiceImpl.java:1034)
          at android.net.wifi.IWifiManager$Stub.onTransact(IWifiManager.java:141)
          at android.os.Binder.execTransact(Binder.java:453)

经过多次测试分析发现一些:

  1. 每次都是开机16分钟后出现"Watchdog: *** WATCHDOG KILLING SYSTEM PROCESS: Blocked in monitor com.android.server.Watchdog$BinderThreadMonitor on foreground thread (android.fg)";
  2. 都是出现"W IPCThreadState: Waiting for thread to be free. mExecutingThreadsCount=15 mMaxThreads=15",即Binder线程达到最大值,已经没有线程去处理新的binder消息,导致binder transaction failed.
  3. 结合trace文件分析,发现确实有多个Binder处于Waiting状态,都是阻塞在com.android.server.wifi.WifiServiceImpl.enableNetwork;
  4. 发现有个APP会频繁的调用enableNetwork,而network id是-1,问题可能是由此引发的.

代码分析Wifif enableNetwork流程:

frameworks/base/wifi/java/android/net/wifi/WifiManager.java

     * @return {@code true} if the operation succeeded
     */
    public boolean enableNetwork(int netId, boolean disableOthers) {
        final boolean pin = disableOthers && mTargetSdkVersion < Build.VERSION_CODES.LOLLIPOP;
        if (pin) {
            registerPinningNetworkCallback();
        }

        boolean success;
        try {
            success = mService.enableNetwork(netId, disableOthers);
        } catch (RemoteException e) {
            success = false;
        }

        if (pin && !success) {
            unregisterPinningNetworkCallback();
        }

        return success;
    }

APP 调用WifiManager.enableNetwork,通过AIDL与服务端进行跨进程通信。

frameworks/base/wifi/java/android/net/wifi/IWifiManager.aidl

    interface IWifiManager
    {
        int getSupportedFeatures();

        WifiActivityEnergyInfo reportActivityInfo();

        List<WifiConfiguration> getConfiguredNetworks();

        List<WifiConfiguration> getPrivilegedConfiguredNetworks();

        WifiConfiguration getMatchingWifiConfig(in ScanResult scanResult);

        int addOrUpdateNetwork(in WifiConfiguration config);

        boolean removeNetwork(int netId);

        boolean enableNetwork(int netId, boolean disableOthers);

服务端:

frameworks/opt/net/wifi/service/java/com/android/server/wifi/WifiServiceImpl.java

        /**
         * See {@link android.net.wifi.WifiManager#enableNetwork(int, boolean)}
         * @param netId the integer that identifies the network configuration
         * to the supplicant
         * @param disableOthers if true, disable all other networks.
         * @return {@code true} if the operation succeeded
         */
        public boolean enableNetwork(int netId, boolean disableOthers) {
            enforceChangePermission();
            if (mWifiStateMachineChannel != null) {
                return mWifiStateMachine.syncEnableNetwork(mWifiStateMachineChannel, netId,
                        disableOthers);
            } else {
                Slog.e(TAG, "mWifiStateMachineChannel is not initialized");
                return false;
            }
        }

frameworks/opt/net/wifi/service/java/com/android/server/wifi/WifiStateMachine.java

        /**
         * Enable a network
         *
         * @param netId         network id of the network
         * @param disableOthers true, if all other networks have to be disabled
         * @return {@code true} if the operation succeeds, {@code false} otherwise
         */
        public boolean syncEnableNetwork(AsyncChannel channel, int netId, boolean disableOthers) {
            Log.d(TAG, "syncEnableNetwork "+channel);
            Message resultMsg = channel.sendMessageSynchronously(CMD_ENABLE_NETWORK, netId,
                    disableOthers ? 1 : 0);
            boolean result = (resultMsg.arg1 != FAILURE);
            resultMsg.recycle();
            Log.d(TAG, "syncEnableNetwork "+channel +" result="+result);
            return result;
        }
    class ConnectModeState extends State {

        @Override
        public void enter() {
            connectScanningService();
        }

        @Override
        public boolean processMessage(Message message) {
            WifiConfiguration config;
            int netId;
            boolean ok;
            boolean didDisconnect;
            String bssid;
            String ssid;
            NetworkUpdateResult result;
            logStateAndMessage(message, getClass().getSimpleName());

            switch (message.what) {
                case WifiMonitor.ASSOCIATION_REJECTION_EVENT:
 ......
                case CMD_ENABLE_NETWORK:
                    boolean disableOthers = message.arg2 == 1;
                    netId = message.arg1;
                    config = mWifiConfigStore.getWifiConfiguration(netId);
                    if (config == null) {
                        loge("No network with id = " + netId);
                        messageHandlingStatus = MESSAGE_HANDLING_STATUS_FAIL;
                        break;
                    }

                    // Tell autojoin the user did try to select to that network
                    // However, do NOT persist the choice by bumping the priority of the network
                    if (disableOthers) {
                        mWifiAutoJoinController.
                                updateConfigurationHistory(netId, true, false);
                        // Set the last selected configuration so as to allow the system to
                        // stick the last user choice without persisting the choice
                        mWifiConfigStore.setLastSelectedConfiguration(netId);

                        // Remember time of last connection attempt
                        lastConnectAttemptTimestamp = System.currentTimeMillis();

                        mWifiConnectionStatistics.numWifiManagerJoinAttempt++;
                    }
                    // Cancel auto roam requests
                    autoRoamSetBSSID(netId, "any");

                    int uid = message.sendingUid;
                    ok = mWifiConfigStore.enableNetwork(netId, disableOthers, uid);
                    if (!ok) {
                        messageHandlingStatus = MESSAGE_HANDLING_STATUS_FAIL;
                    }

                    replyToMessage(message, message.what, ok ? SUCCESS : FAILURE);
                    break;
                case CMD_ENABLE_ALL_NETWORKS:

看到问题了,ConnectModeState处理CMD_ENABLE_NETWORK命令时,如果没有找到netId对应的网络,直接break了,而AsyncChannel的客户端还一直在等应答。如果APP频繁的调用enableNetwork,而且network id是-1,就会导致“W IPCThreadState: Waiting for thread to be free. mExecutingThreadsCount=15 mMaxThreads=15”.

修改后的代码:

    --- a/service/java/com/android/server/wifi/WifiStateMachine.java
    +++ b/service/java/com/android/server/wifi/WifiStateMachine.java
    @@ -7356,6 +7358,7 @@ public class WifiStateMachine extends StateMachine implements WifiNative.WifiPno
                         if (config == null) {
                             loge("No network with id = " + netId);
                             messageHandlingStatus = MESSAGE_HANDLING_STATUS_FAIL;
    +                        replyToMessage(message, message.what, FAILURE);
                             break;
                         }

关于AsyncChannel

AsyncChannel为两个Handler之间的异步通道,Handler可能在同一个进程中,也可能在另一个进程中。AysncChannel可以使用两种协议样式。

  • 第一种是简单的请求/应答协议,其中服务器不需要知道是哪个客户机发出请求。在简单的请求/应答协议中,客户端/源端向服务器/目的地发送请求。服务器使用replyToMessage方法。

  • 第二种是服务器/目的地也需要知道它连接的是哪个客户端。例如,服务器需要将未经请求的消息发送回客户机。或者服务器为每个客户机保留不同的状态。在这个模型中,服务器还将使用连接方法。

frameworks/base/core/java/com/android/internal/util/AsyncChannel.java

    private static Message sendMessageSynchronously(Messenger dstMessenger, Message msg) {
                SyncMessenger sm = SyncMessenger.obtain();
                try {
                    if (dstMessenger != null && msg != null) {
                        msg.replyTo = sm.mMessenger;
                        synchronized (sm.mHandler.mLockObject) {
                            dstMessenger.send(msg);
                            sm.mHandler.mLockObject.wait();
                        }
                    } else {
                        sm.mHandler.mResultMsg = null;
                    }
                } catch (InterruptedException e) {
                    sm.mHandler.mResultMsg = null;
                } catch (RemoteException e) {
                    sm.mHandler.mResultMsg = null;
                }
                Message resultMsg = sm.mHandler.mResultMsg;
                sm.recycle();
                return resultMsg;
            }
<think>嗯,用户提到的高通平台在关机充电时出现死机重启的问题,我需要先理解可能的原因。首先,关机充电模式下,系统通常会进入低功耗状态,但部分硬件模块如充电管理、电池检测等仍需工作。可能的问题点包括电源管理固件或驱动异常、电池检测逻辑错误、硬件冲突等。 根据用户提供的引用[1],提到的是HD-audio的补丁加载,这可能与音频驱动有关,但不确定是否直接关联充电问题。不过,有时候系统在低功耗模式下,未正确挂载的驱动模块可能导致异常。需要检查充电相关的驱动模块,比如电池管理芯片的驱动、PMIC(电源管理集成电路)的配置,以及充电过程中的中断处理是否正常。 接下来,可能的解决方案包括更新或修复电源管理相关的驱动和固件,检查充电电路的硬件连接,以及调整低功耗模式下的系统配置。此外,查看内核日志(dmesg)和死机前的最后日志,可以定位到具体的错误信息,如空指针引用或资源竞争等问题。 用户可能需要逐步排查,比如先更新驱动,然后检查硬件,再调整系统配置。同时,参考高通官方文档或社区中类似问题的解决方案也会有帮助。需要确保建议的步骤清晰,并提示用户备份数据以防万一。</think>### 高通平台关机充电死机重启问题解决方案 #### 1. 初步排查硬件与固件 - **检查电池与充电电路**:使用万用表测量电池接口电压,确认是否存在短路或接触不良。高通平台充电管理依赖PMIC(电源管理集成电路),需验证其工作状态[^1] - **更新固件与驱动**:从高通开发者平台获取最新版PMIC驱动和充电管理固件,例如: ```c // PMIC驱动关键函数示例(电源状态检测) static int pmic_check_power_state(struct device *dev) { return regmap_read(pmic->regmap, REG_PWR_STATE, &val); } ``` #### 2. 系统日志分析 - **获取内核日志**: ```bash adb shell dmesg | grep -E "battery|charger|pmic" ``` - **常见错误模式**: - `kernel panic - not syncing: PMIC over-voltage protection` - `charger: Error: battery thermistor reading timeout` #### 3. 软件配置优化 - **调整充电策略**:修改`/sys/class/power_supply/battery/`下的参数文件: ```bash echo 1500000 > constant_charge_current_max # 限制最大充电电流为1.5A ``` - **禁用冲突模块**:在关机充电模式配置文件中注释可能冲突的驱动: ```c // drivers/power/supply/qcom/qpnp-qg.c // module_init(qpnp_qg_init); // 测试时暂时禁用 ``` #### 4. 低功耗模式调试 - **修改CPU休眠状态**: ```device-tree &qcom,pmic { qcom,s3-devs = <&pm8998_s3>; qcom,s3-mode = <1>; // 修改为深度休眠模式 }; ``` - **电压稳定性测试**:使用QPST工具进行PMIC电压波形捕获,确保充电过程中无异常电压波动 #### 5. 安全机制增强 - **增加看门狗检测**: ```c void charging_monitor_task(void) { while (1) { if (read_charger_status() == FAULT) { emergency_shutdown(); } mdelay(1000); } } ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值