Android进阶_Handler和Handler.Callback和Message

本文探讨了Android开发中Handler的使用方法及注意事项,解释了非静态Handler可能导致内存泄漏的原因,并提出了两种解决方案:一是将Handler声明为静态成员;二是通过Callback接口实现。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

最近得病了,看到代码黄色警报不是很爽。虽然不影响运行,但是看着代码不干净不利索。


Handler,线程通信,用的比较多吧,今天写代码一如往常的:

	private Handler smsHandler = new Handler(){
		public void handleMessage(Message msg) {
			
		};
	};
感觉没什么错误,怎么出黄色警报了。

This Handler class should be static or leaks might occur 

大体意思是说 Handler应该是静态的,否则会造成内存溢出。

然后将Handler用static修饰,ok,警报消失。


但是遇到问题了:不设置成静态会造成内存溢出?要毁三观?然后就开始寻求网络资源的帮助

http://www.cnblogs.com/jevan/p/3168828.html

在这里找到了答案,MessageQueue中的消息队列会一直持有对handler的引用,而作为内部类的handler会一直持有外部类的引用,就会导致外部类不能被GC回收。当我们发延时很长的msg时就容易出现泄漏。所以此处应该设置为static,然后Handler就会跟随类而不是跟随类的对象加载,也就不再持有外部类的对象引用。

单单这个问题可以直接看上面这篇文章。


通过搜索还找到了另一个解决黄色警报的解决方案:在新建Handler对象的时候传入一个Handler.Callback对象。

	private Handler mHandler = new Handler(new Handler.Callback() {

		@Override
		public boolean handleMessage(Message msg) {
			return false;
		}
	});
以前还没注意过这个参数呢,这个方法,返回值都是干啥的?

然后去看了一下这一块的源码:

    /**
     * Callback interface you can use when instantiating a Handler to avoid
     * having to implement your own subclass of Handler.
     *
     * @param msg A {@link android.os.Message Message} object
     * @return True if no further handling is desired
     */
    public interface Callback {<span style="white-space:pre">	</span>// callback接口
        public boolean handleMessage(Message msg);
    }
    
    /**
     * Subclasses must implement this to receive messages.
     */
    public void handleMessage(Message msg) {
    }
    
    /**
     * Handle system messages here.
     */
    public void dispatchMessage(Message msg) {
        if (msg.callback != null) { // callback是msg中的一个字段,是一个Runnable对象,当通过handler.post方法发送一个runnable的时候就会被封装到这个msg中
            handleCallback(msg); // 此方法是Handler中的一个静态方法,方法体:message.callback.run();只有这一句,可以看出是直接调用的run()方法,没有新建线程,否则也不符合这里线程通信了对吧。
        } else {
            if (mCallback != null) { // mCallback就是上面接口的对象
                if (mCallback.handleMessage(msg)) { // 如果返回true直接return结束方法,不再调用handler中的<span style="font-family: Arial, Helvetica, sans-serif;">handleMessage方法。</span>
                    return;
                }
            }
            handleMessage(msg);// handler中的消息处理方法
        }
    }

Looper从MQ中拿到一个msg就会调用handler中的dispatchMessage方法,然后有此方法进行分发处理。想到触摸时间分发了木有?

最后记录一下Message对象的获取,原来一直使用Message.obtain方法来获取,今天无意间发现了handler.obtainMessage方法也可以获取。

通过源码发现,其实handler.obtainMessage最终也是调用了Message中的一系列obtain方法。



/home/shuo/VLA/openpi/.venv/lib/python3.11/site-packages/tyro/_parsers.py:332: UserWarning: The field `model.action-expert-variant` is annotated with type `typing.Literal['dummy', 'gemma_300m', 'gemma_2b', 'gemma_2b_lora']`, but the default value `gemma_300m_lora` has type `<class 'str'>`. We'll try to handle this gracefully, but it may cause unexpected behavior. warnings.warn(message) 19:07:30.004 [I] Running on: shuo-hp (10287:train.py:195) INFO:2025-05-12 19:07:30,228:jax._src.xla_bridge:945: Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig' 19:07:30.228 [I] Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig' (10287:xla_bridge.py:945) INFO:2025-05-12 19:07:30,228:jax._src.xla_bridge:945: Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory 19:07:30.228 [I] Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory (10287:xla_bridge.py:945) 19:07:30.500 [I] Wiped checkpoint directory /home/shuo/VLA/openpi/checkpoints/pi0_ours_aloha/your_experiment_name (10287:checkpoints.py:25) 19:07:30.500 [I] Created BasePyTreeCheckpointHandler: pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=None (10287:base_pytree_checkpoint_handler.py:332) 19:07:30.500 [I] Created BasePyTreeCheckpointHandler: pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=None (10287:base_pytree_checkpoint_handler.py:332) 19:07:30.500 [I] [thread=MainThread] Failed to get flag value for EXPERIMENTAL_ORBAX_USE_DISTRIBUTED_PROCESS_ID. (10287:multihost.py:375) 19:07:30.500 [I] [process=0][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'assets': <openpi.training.checkpoints.CallbackHandler object at 0x72e5cae0ff50>, 'train_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x72e5cafa0e90>, 'params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x72e5cafa05d0>}, handler_registry=None (10287:checkpoint_manager.py:622) 19:07:30.501 [I] Deferred registration for item: "assets". Adding handler `<openpi.training.checkpoints.CallbackHandler object at 0x72e5cae0ff50>` for item "assets" and save args `<class 'openpi.training.checkpoints.CallbackSave'>` and restore args `<class 'openpi.training.checkpoints.CallbackRestore'>` to `_handler_registry`. (10287:composite_checkpoint_handler.py:239) 19:07:30.501 [I] Deferred registration for item: "train_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x72e5cafa0e90>` for item "train_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. (10287:composite_checkpoint_handler.py:239) 19:07:30.501 [I] Deferred registration for item: "params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x72e5cafa05d0>` for item "params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. (10287:composite_checkpoint_handler.py:239) 19:07:30.501 [I] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x72e5cad7fd10>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. (10287:composite_checkpoint_handler.py:239) 19:07:30.501 [I] Initialized registry DefaultCheckpointHandlerRegistry({('assets', <class 'openpi.training.checkpoints.CallbackSave'>): <openpi.training.checkpoints.CallbackHandler object at 0x72e5cae0ff50>, ('assets', <class 'openpi.training.checkpoints.CallbackRestore'>): <openpi.training.checkpoints.CallbackHandler object at 0x72e5cae0ff50>, ('train_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x72e5cafa0e90>, ('train_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x72e5cafa0e90>, ('params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x72e5cafa05d0>, ('params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x72e5cafa05d0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x72e5cad7fd10>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x72e5cad7fd10>}). (10287:composite_checkpoint_handler.py:508) 19:07:30.501 [I] orbax-checkpoint version: 0.11.1 (10287:abstract_checkpointer.py:35) 19:07:30.501 [I] [process=0][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>.<lambda> at 0x72e5cacb85e0> timeout: 7200 secs and primary_host=0 for async checkpoint writes (10287:async_checkpointer.py:80) 19:07:30.501 [I] Found 0 checkpoint steps in /home/shuo/VLA/openpi/checkpoints/pi0_ours_aloha/your_experiment_name (10287:checkpoint_manager.py:1528) 19:07:30.501 [I] Saving root metadata (10287:checkpoint_manager.py:1569) 19:07:30.501 [I] [process=0][thread=MainThread] Skipping global process sync, barrier name: CheckpointManager:save_metadata (10287:multihost.py:293) 19:07:30.501 [I] [process=0][thread=MainThread] CheckpointManager created, primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=1, max_to_keep=1, keep_time_interval=None, keep_period=5000, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=False, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=AsyncOptions(timeout_secs=7200, barrier_sync_fn=None, post_finalization_callback=None, create_directories_asynchronously=False), multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None), root_directory=/home/shuo/VLA/openpi/checkpoints/pi0_ours_aloha/your_experiment_name: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x72e5cadffd10> (10287:checkpoint_manager.py:797) 19:07:30.553 [I] Loaded norm stats from s3://openpi-assets/checkpoints/pi0_base/assets/trossen (10287:config.py:166) Returning existing local_dir `/home/shuo/VLA/lerobot/aloha-real-data` as remote repo cannot be accessed in `snapshot_download` (None). 19:07:30.553 [W] Returning existing local_dir `/home/shuo/VLA/lerobot/aloha-real-data` as remote repo cannot be accessed in `snapshot_download` (None). (10287:_snapshot_download.py:213) Returning existing local_dir `/home/shuo/VLA/lerobot/aloha-real-data` as remote repo cannot be accessed in `snapshot_download` (None). 19:07:30.554 [W] Returning existing local_dir `/home/shuo/VLA/lerobot/aloha-real-data` as remote repo cannot be accessed in `snapshot_download` (None). (10287:_snapshot_download.py:213) Returning existing local_dir `/home/shuo/VLA/lerobot/aloha-real-data` as remote repo cannot be accessed in `snapshot_download` (None). 19:07:30.555 [W] Returning existing local_dir `/home/shuo/VLA/lerobot/aloha-real-data` as remote repo cannot be accessed in `snapshot_download` (None). (10287:_snapshot_download.py:213) Traceback (most recent call last): File "/home/shuo/VLA/openpi/scripts/train.py", line 273, in <module> main(_config.cli()) File "/home/shuo/VLA/openpi/scripts/train.py", line 226, in main batch = next(data_iter) ^^^^^^^^^^^^^^^ File "/home/shuo/VLA/openpi/src/openpi/training/data_loader.py", line 177, in __iter__ for batch in self._data_loader: File "/home/shuo/VLA/openpi/src/openpi/training/data_loader.py", line 257, in __iter__ batch = next(data_iter) ^^^^^^^^^^^^^^^ File "/home/shuo/VLA/openpi/.venv/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 708, in __next__ data = self._next_data() ^^^^^^^^^^^^^^^^^ File "/home/shuo/VLA/openpi/.venv/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 1480, in _next_data return self._process_data(data) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/shuo/VLA/openpi/.venv/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 1505, in _process_data data.reraise() File "/home/shuo/VLA/openpi/.venv/lib/python3.11/site-packages/torch/_utils.py", line 733, in reraise raise exception KeyError: Caught KeyError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/shuo/VLA/openpi/.venv/lib/python3.11/site-packages/torch/utils/data/_utils/worker.py", line 349, in _worker_loop data = fetcher.fetch(index) # type: ignore[possibly-undefined] ^^^^^^^^^^^^^^^^^^^^ File "/home/shuo/VLA/openpi/.venv/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/shuo/VLA/openpi/.venv/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 52, in <listcomp> data = [self.dataset[idx] for idx in possibly_batched_index] ~~~~~~~~~~~~^^^^^ File "/home/shuo/VLA/openpi/src/openpi/training/data_loader.py", line 47, in __getitem__ return self._transform(self._dataset[index]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/shuo/VLA/openpi/src/openpi/transforms.py", line 70, in __call__ data = transform(data) ^^^^^^^^^^^^^^^ File "/home/shuo/VLA/openpi/src/openpi/transforms.py", line 101, in __call__ return jax.tree.map(lambda k: flat_item[k], self.structure) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/shuo/VLA/openpi/.venv/lib/python3.11/site-packages/jax/_src/tree.py", line 155, in map return tree_util.tree_map(f, tree, *rest, is_leaf=is_leaf) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/shuo/VLA/openpi/.venv/lib/python3.11/site-packages/jax/_src/tree_util.py", line 358, in tree_map return treedef.unflatten(f(*xs) for xs in zip(*all_leaves)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/shuo/VLA/openpi/.venv/lib/python3.11/site-packages/jax/_src/tree_util.py", line 358, in <genexpr> return treedef.unflatten(f(*xs) for xs in zip(*all_leaves)) ^^^^^^ File "/home/shuo/VLA/openpi/src/openpi/transforms.py", line 101, in <lambda> return jax.tree.map(lambda k: flat_item[k], self.structure) ~~~~~~~~~^^^ KeyError: 'observation.images.cam_low'
最新发布
05-13
<think>好的,我现在要解决用户提到的两个问题:KeyError: 'observation.images.cam_low' 无法初始化 ROCm 及 TPU 后端的警告。首先,我需要分别分析这两个问题的可能原因,并找到对应的解决方法。 首先,关于KeyError: 'observation.images.cam_low'。根据用户之前的对话,用户在使用JAX的tree_map处理数据,可能与数据加载有关。可能的情况是数据集中确实不存在'cam_low'这个键,或者在数据预处理时没有正确生成该字段。另外,用户提到了使用Torch的DataLoader,可能在多进程数据加载时,子进程中没有正确初始化某些环境变量,导致数据键缺失。我需要检查数据加载的代码,确认数据结构的正确性,确保在访问'observation.images.cam_low'之前该键存在。此外,用户可能在跨框架(如JAXPyTorch)处理数据时,数据结构转换出现了问题,需要检查数据转换步骤。 接下来是ROCmTPU后端初始化失败的警告。用户可能的环境是混合使用了不同硬件的支持库,比如同时安装了ROCm(AMD GPU)TPU相关的库,但当前系统并没有对应的硬件,导致JAX或PyTorch尝试初始化这些后端时失败。需要检查用户安装的库版本,确认是否不必要地安装了ROCm或TPU的依赖。例如,JAX默认会尝试检测可用的后端,如果不需要这些后端,可以通过环境变量禁用它们。此外,用户可能在代码中显式调用了某些初始化函数,需要移除或条件化这些调用。 对于KeyError,可能的解决步骤包括:1) 检查数据集的结构,确认是否存在'cam_low'键;2) 在数据预处理步骤中添加该键或重命名现有的键;3) 在访问该键之前添加条件判断,避免KeyError;4) 检查DataLoader的工作进程是否正确地传递了数据,尤其是在多进程加载时,可能需要调整参数或初始化方法。 对于ROCmTPU的警告,可能的解决方法有:1) 确认是否安装了不必要的ROCm或TPU库,卸载它们;2) 设置环境变量禁用不需要的后端,如JAX_PLATFORM_NAME='cpu'或'gpu';3) 在代码中显式指定使用的后端;4) 更新JAX或PyTorch到最新版本,确保兼容性。 此外,用户可能的环境配置问题,比如ROCm驱动未正确安装,或者TPU环境未配置,但用户并没有使用这些硬件,因此需要关闭相关功能。同时,检查是否有冲突的库版本,例如JAX的不同版本对后端支持的变化,可能需要降级或升级某些包。 在整合这些解决方案时,需要注意用户的具体代码环境配置,可能需要用户提供更多信息,比如数据加载部分的代码片段、安装的库版本等。如果用户无法提供,则需要给出通用的排查步骤建议,让用户自行检查。 最后,确保回答结构清晰,分点说明每个问题的可能原因解决方法,并给出相关的代码示例或命令,方便用户操作。同时,按照用户要求生成相关问题,帮助用户深入理解可能相关的其他问题。</think>### 解决 KeyError: 'observation.images.cam_low' 与 ROCM/TPU 后端初始化失败问题 --- #### 一、KeyError: 'observation.images.cam_low' 分析 **根本原因**:数据加载或预处理过程中缺少 `cam_low` 字段,常见于以下场景: 1. **数据集格式不匹配**:数据源未包含 `observation.images.cam_low` 键(如自定义数据集或版本差异) 2. **跨框架数据转换问题**:JAX `tree_map` 与 PyTorch DataLoader 协作时数据类型/结构不兼容 3. **多进程数据加载异常**:DataLoader 的 `num_workers>0` 导致子进程未正确传递数据字段 **解决方案**: 1. **验证数据集结构** 添加调试代码检查数据键: ```python # 在数据加载循环中添加 for batch in dataloader: print("Batch keys:", batch.keys()) # 检查实际存在的键 if 'observation' in batch: print("Observation subkeys:", batch['observation'].keys()) ``` 2. **修复数据预处理** 若数据源实际键为 `cam_high` 或类似名称,需修改键名映射: ```python # 示例:重命名键 transformed_data = { 'observation': { 'images': { 'cam_low': original_data['observation']['images']['webcam'] # 替换实际键名 } } } ``` 3. **调整 DataLoader 参数** 禁用多进程加载以排查问题(临时设置 `num_workers=0`): ```python dataloader = DataLoader(dataset, num_workers=0) # 逐步增加 workers 数量测试 ``` --- #### 二、ROCM/TPU 后端初始化失败警告 **根本原因**:环境配置与硬件不匹配导致 JAX/PyTorch 尝试初始化不存在的后端: 1. **ROCM 驱动未安装**:AMD GPU 环境缺少 ROCm 驱动或版本不兼容 2. **TPU 环境未配置**:未使用 Google Cloud TPU 但安装了相关依赖 3. **框架默认后端冲突**:JAX/PyTorch 自动检测后端顺序问题 **解决方案**: 1. **显式指定计算后端** 在代码中强制使用 CPU/GPU: ```python # 对于 JAX import jax jax.config.update('jax_platform_name', 'cpu') # 或 'gpu' # 对于 PyTorch import torch device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') ``` 2. **清理冲突依赖** 卸载不必要的后端支持库: ```bash # 移除 ROCm 相关(若未使用 AMD GPU) pip uninstall torch-rocm # 移除 TPU 相关(若未使用 TPU) pip uninstall jax[tpu] ``` 3. **环境变量控制** 阻止框架自动初始化特定后端: ```bash # 在终端或代码中设置 export JAX_PLATFORMS='cpu' # 或 'gpu' export XLA_FLAGS='--xla_force_host_platform_device_count=1' ``` --- #### 三、联合调试建议 1. **最小化复现场景** 创建独立测试脚本验证数据加载: ```python # 仅加载单个样本测试 sample = next(iter(dataloader)) print(jax.tree_map(lambda x: x.shape, sample)) ``` 2. **版本兼容性检查** 确保框架版本匹配(参考组合): ```txt PyTorch 2.0 + JAX 0.4.13 + CUDA 11.8 (NVIDIA GPU) PyTorch 1.12 + ROCm 5.3 (AMD GPU) ``` 3. **硬件隔离测试** 分步验证计算后端: ```python # 第一阶段:仅用 CPU with torch.no_grad(): outputs = model.cpu()(inputs.cpu()) # 第二阶段:启用 GPU(若可用) model.to(device) ``` ---
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值