[6773] INTERNAL ERROR: cannot create temporary directory!

当执行$sudodocker-composeup--build命令时遇到DockerCompose的内部错误,可能是由于权限问题、存储空间不足或Docker设置配置导致。建议检查文件和文件夹权限、可用存储空间以及Docker的设置。如果问题持续,可以考虑重新安装Docker。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

  • When executing the command: $ sudo docker-compose up --build

  It seems like you’re having an issue with Docker Compose which is causing an internal error and preventing you from creating a temporary directory. This error may have several possible causes, like permission issues, lack of storage space, or Docker’s settings/configuration. Here are a few things you could try:

  • Check File and Folder Permissions: Ensure that the Docker daemon has
    the required permissions to create and modify the directories it
    needs to. If the folder you’re working in doesn’t have the correct
    permissions, Docker won’t be able to operate correctly. You can use
    the chmod command to modify permissions, and chown to change
    ownership if necessary.

  • Check Available Storage Space: This error can also occur if you’re running low on storage space. You ca

/home/shuo/VLA/openpi/.venv/lib/python3.11/site-packages/tyro/_parsers.py:332: UserWarning: The field `model.action-expert-variant` is annotated with type `typing.Literal['dummy', 'gemma_300m', 'gemma_2b', 'gemma_2b_lora']`, but the default value `gemma_300m_lora` has type `<class 'str'>`. We'll try to handle this gracefully, but it may cause unexpected behavior. warnings.warn(message) 19:07:30.004 [I] Running on: shuo-hp (10287:train.py:195) INFO:2025-05-12 19:07:30,228:jax._src.xla_bridge:945: Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig' 19:07:30.228 [I] Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig' (10287:xla_bridge.py:945) INFO:2025-05-12 19:07:30,228:jax._src.xla_bridge:945: Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory 19:07:30.228 [I] Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory (10287:xla_bridge.py:945) 19:07:30.500 [I] Wiped checkpoint directory /home/shuo/VLA/openpi/checkpoints/pi0_ours_aloha/your_experiment_name (10287:checkpoints.py:25) 19:07:30.500 [I] Created BasePyTreeCheckpointHandler: pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=None (10287:base_pytree_checkpoint_handler.py:332) 19:07:30.500 [I] Created BasePyTreeCheckpointHandler: pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=None (10287:base_pytree_checkpoint_handler.py:332) 19:07:30.500 [I] [thread=MainThread] Failed to get flag value for EXPERIMENTAL_ORBAX_USE_DISTRIBUTED_PROCESS_ID. (10287:multihost.py:375) 19:07:30.500 [I] [process=0][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'assets': <openpi.training.checkpoints.CallbackHandler object at 0x72e5cae0ff50>, 'train_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x72e5cafa0e90>, 'params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x72e5cafa05d0>}, handler_registry=None (10287:checkpoint_manager.py:622) 19:07:30.501 [I] Deferred registration for item: "assets". Adding handler `<openpi.training.checkpoints.CallbackHandler object at 0x72e5cae0ff50>` for item "assets" and save args `<class 'openpi.training.checkpoints.CallbackSave'>` and restore args `<class 'openpi.training.checkpoints.CallbackRestore'>` to `_handler_registry`. (10287:composite_checkpoint_handler.py:239) 19:07:30.501 [I] Deferred registration for item: "train_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x72e5cafa0e90>` for item "train_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. (10287:composite_checkpoint_handler.py:239) 19:07:30.501 [I] Deferred registration for item: "params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x72e5cafa05d0>` for item "params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. (10287:composite_checkpoint_handler.py:239) 19:07:30.501 [I] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x72e5cad7fd10>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. (10287:composite_checkpoint_handler.py:239) 19:07:30.501 [I] Initialized registry DefaultCheckpointHandlerRegistry({('assets', <class 'openpi.training.checkpoints.CallbackSave'>): <openpi.training.checkpoints.CallbackHandler object at 0x72e5cae0ff50>, ('assets', <class 'openpi.training.checkpoints.CallbackRestore'>): <openpi.training.checkpoints.CallbackHandler object at 0x72e5cae0ff50>, ('train_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x72e5cafa0e90>, ('train_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x72e5cafa0e90>, ('params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x72e5cafa05d0>, ('params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x72e5cafa05d0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x72e5cad7fd10>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x72e5cad7fd10>}). (10287:composite_checkpoint_handler.py:508) 19:07:30.501 [I] orbax-checkpoint version: 0.11.1 (10287:abstract_checkpointer.py:35) 19:07:30.501 [I] [process=0][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>.<lambda> at 0x72e5cacb85e0> timeout: 7200 secs and primary_host=0 for async checkpoint writes (10287:async_checkpointer.py:80) 19:07:30.501 [I] Found 0 checkpoint steps in /home/shuo/VLA/openpi/checkpoints/pi0_ours_aloha/your_experiment_name (10287:checkpoint_manager.py:1528) 19:07:30.501 [I] Saving root metadata (10287:checkpoint_manager.py:1569) 19:07:30.501 [I] [process=0][thread=MainThread] Skipping global process sync, barrier name: CheckpointManager:save_metadata (10287:multihost.py:293) 19:07:30.501 [I] [process=0][thread=MainThread] CheckpointManager created, primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=1, max_to_keep=1, keep_time_interval=None, keep_period=5000, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=False, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=AsyncOptions(timeout_secs=7200, barrier_sync_fn=None, post_finalization_callback=None, create_directories_asynchronously=False), multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None), root_directory=/home/shuo/VLA/openpi/checkpoints/pi0_ours_aloha/your_experiment_name: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x72e5cadffd10> (10287:checkpoint_manager.py:797) 19:07:30.553 [I] Loaded norm stats from s3://openpi-assets/checkpoints/pi0_base/assets/trossen (10287:config.py:166) Returning existing local_dir `/home/shuo/VLA/lerobot/aloha-real-data` as remote repo cannot be accessed in `snapshot_download` (None). 19:07:30.553 [W] Returning existing local_dir `/home/shuo/VLA/lerobot/aloha-real-data` as remote repo cannot be accessed in `snapshot_download` (None). (10287:_snapshot_download.py:213) Returning existing local_dir `/home/shuo/VLA/lerobot/aloha-real-data` as remote repo cannot be accessed in `snapshot_download` (None). 19:07:30.554 [W] Returning existing local_dir `/home/shuo/VLA/lerobot/aloha-real-data` as remote repo cannot be accessed in `snapshot_download` (None). (10287:_snapshot_download.py:213) Returning existing local_dir `/home/shuo/VLA/lerobot/aloha-real-data` as remote repo cannot be accessed in `snapshot_download` (None). 19:07:30.555 [W] Returning existing local_dir `/home/shuo/VLA/lerobot/aloha-real-data` as remote repo cannot be accessed in `snapshot_download` (None). (10287:_snapshot_download.py:213) Traceback (most recent call last): File "/home/shuo/VLA/openpi/scripts/train.py", line 273, in <module> main(_config.cli()) File "/home/shuo/VLA/openpi/scripts/train.py", line 226, in main batch = next(data_iter) ^^^^^^^^^^^^^^^ File "/home/shuo/VLA/openpi/src/openpi/training/data_loader.py", line 177, in __iter__ for batch in self._data_loader: File "/home/shuo/VLA/openpi/src/openpi/training/data_loader.py", line 257, in __iter__ batch = next(data_iter) ^^^^^^^^^^^^^^^ File "/home/shuo/VLA/openpi/.venv/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 708, in __next__ data = self._next_data() ^^^^^^^^^^^^^^^^^ File "/home/shuo/VLA/openpi/.venv/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 1480, in _next_data return self._process_data(data) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/shuo/VLA/openpi/.venv/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 1505, in _process_data data.reraise() File "/home/shuo/VLA/openpi/.venv/lib/python3.11/site-packages/torch/_utils.py", line 733, in reraise raise exception KeyError: Caught KeyError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/shuo/VLA/openpi/.venv/lib/python3.11/site-packages/torch/utils/data/_utils/worker.py", line 349, in _worker_loop data = fetcher.fetch(index) # type: ignore[possibly-undefined] ^^^^^^^^^^^^^^^^^^^^ File "/home/shuo/VLA/openpi/.venv/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/shuo/VLA/openpi/.venv/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 52, in <listcomp> data = [self.dataset[idx] for idx in possibly_batched_index] ~~~~~~~~~~~~^^^^^ File "/home/shuo/VLA/openpi/src/openpi/training/data_loader.py", line 47, in __getitem__ return self._transform(self._dataset[index]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/shuo/VLA/openpi/src/openpi/transforms.py", line 70, in __call__ data = transform(data) ^^^^^^^^^^^^^^^ File "/home/shuo/VLA/openpi/src/openpi/transforms.py", line 101, in __call__ return jax.tree.map(lambda k: flat_item[k], self.structure) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/shuo/VLA/openpi/.venv/lib/python3.11/site-packages/jax/_src/tree.py", line 155, in map return tree_util.tree_map(f, tree, *rest, is_leaf=is_leaf) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/shuo/VLA/openpi/.venv/lib/python3.11/site-packages/jax/_src/tree_util.py", line 358, in tree_map return treedef.unflatten(f(*xs) for xs in zip(*all_leaves)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/shuo/VLA/openpi/.venv/lib/python3.11/site-packages/jax/_src/tree_util.py", line 358, in <genexpr> return treedef.unflatten(f(*xs) for xs in zip(*all_leaves)) ^^^^^^ File "/home/shuo/VLA/openpi/src/openpi/transforms.py", line 101, in <lambda> return jax.tree.map(lambda k: flat_item[k], self.structure) ~~~~~~~~~^^^ KeyError: 'observation.images.cam_low'
最新发布
05-13
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值