root@b82c95bfc43f:/workspace/data-zj/test_code_v1# . train.sh
/usr/local/lib/python3.10/dist-packages/torch/utils/_pytree.py:185: FutureWarning: optree is installed but the version is too old to support PyTorch Dynamo in C++ pytree. C++ pytree support is disabled. Please consider upgrading optree using `python3 -m pip install --upgrade 'optree>=0.13.0'`.
warnings.warn(
W0529 02:52:35.084000 938 torch/distributed/run.py:792]
W0529 02:52:35.084000 938 torch/distributed/run.py:792] *****************************************
W0529 02:52:35.084000 938 torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
W0529 02:52:35.084000 938 torch/distributed/run.py:792] *****************************************
[W529 02:52:39.311003207 socket.cpp:204] [c10d] The hostname of the client socket cannot be retrieved. err=-3
[W529 02:52:43.313869930 socket.cpp:204] [c10d] The hostname of the client socket cannot be retrieved. err=-3
/usr/local/lib/python3.10/dist-packages/torch/utils/_pytree.py:185: FutureWarning: optree is installed but the version is too old to support PyTorch Dynamo in C++ pytree. C++ pytree support is disabled. Please consider upgrading optree using `python3 -m pip install --upgrade 'optree>=0.13.0'`.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/torch/utils/_pytree.py:185: FutureWarning: optree is installed but the version is too old to support PyTorch Dynamo in C++ pytree. C++ pytree support is disabled. Please consider upgrading optree using `python3 -m pip install --upgrade 'optree>=0.13.0'`.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/torch/utils/_pytree.py:185: FutureWarning: optree is installed but the version is too old to support PyTorch Dynamo in C++ pytree. C++ pytree support is disabled. Please consider upgrading optree using `python3 -m pip install --upgrade 'optree>=0.13.0'`.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/torch/utils/_pytree.py:185: FutureWarning: optree is installed but the version is too old to support PyTorch Dynamo in C++ pytree. C++ pytree support is disabled. Please consider upgrading optree using `python3 -m pip install --upgrade 'optree>=0.13.0'`.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/torch/utils/_pytree.py:185: FutureWarning: optree is installed but the version is too old to support PyTorch Dynamo in C++ pytree. C++ pytree support is disabled. Please consider upgrading optree using `python3 -m pip install --upgrade 'optree>=0.13.0'`.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/torch/utils/_pytree.py:185: FutureWarning: optree is installed but the version is too old to support PyTorch Dynamo in C++ pytree. C++ pytree support is disabled. Please consider upgrading optree using `python3 -m pip install --upgrade 'optree>=0.13.0'`.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/torch/utils/_pytree.py:185: FutureWarning: optree is installed but the version is too old to support PyTorch Dynamo in C++ pytree. C++ pytree support is disabled. Please consider upgrading optree using `python3 -m pip install --upgrade 'optree>=0.13.0'`.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/torch/utils/_pytree.py:185: FutureWarning: optree is installed but the version is too old to support PyTorch Dynamo in C++ pytree. C++ pytree support is disabled. Please consider upgrading optree using `python3 -m pip install --upgrade 'optree>=0.13.0'`.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/torch/utils/_pytree.py:185: FutureWarning: optree is installed but the version is too old to support PyTorch Dynamo in C++ pytree. C++ pytree support is disabled. Please consider upgrading optree using `python3 -m pip install --upgrade 'optree>=0.13.0'`.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/torch/utils/_pytree.py:185: FutureWarning: optree is installed but the version is too old to support PyTorch Dynamo in C++ pytree. C++ pytree support is disabled. Please consider upgrading optree using `python3 -m pip install --upgrade 'optree>=0.13.0'`.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/torch/utils/_pytree.py:185: FutureWarning: optree is installed but the version is too old to support PyTorch Dynamo in C++ pytree. C++ pytree support is disabled. Please consider upgrading optree using `python3 -m pip install --upgrade 'optree>=0.13.0'`.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/torch/utils/_pytree.py:185: FutureWarning: optree is installed but the version is too old to support PyTorch Dynamo in C++ pytree. C++ pytree support is disabled. Please consider upgrading optree using `python3 -m pip install --upgrade 'optree>=0.13.0'`.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/torch/utils/_pytree.py:185: FutureWarning: optree is installed but the version is too old to support PyTorch Dynamo in C++ pytree. C++ pytree support is disabled. Please consider upgrading optree using `python3 -m pip install --upgrade 'optree>=0.13.0'`.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/torch/utils/_pytree.py:185: FutureWarning: optree is installed but the version is too old to support PyTorch Dynamo in C++ pytree. C++ pytree support is disabled. Please consider upgrading optree using `python3 -m pip install --upgrade 'optree>=0.13.0'`.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/torch/utils/_pytree.py:185: FutureWarning: optree is installed but the version is too old to support PyTorch Dynamo in C++ pytree. C++ pytree support is disabled. Please consider upgrading optree using `python3 -m pip install --upgrade 'optree>=0.13.0'`.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/torch/utils/_pytree.py:185: FutureWarning: optree is installed but the version is too old to support PyTorch Dynamo in C++ pytree. C++ pytree support is disabled. Please consider upgrading optree using `python3 -m pip install --upgrade 'optree>=0.13.0'`.
warnings.warn(
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.10/dist-packages/axolotl/cli/train.py", line 17, in <module>
from axolotl.cli.config import load_cfg
File "/usr/local/lib/python3.10/dist-packages/axolotl/cli/config.py", line 19, in <module>
from axolotl.utils.config import (
File "/usr/local/lib/python3.10/dist-packages/axolotl/utils/config/__init__.py", line 16, in <module>
from axolotl.utils.models import MULTIMODAL_AUTO_MODEL_MAPPING, load_model_config
File "/usr/local/lib/python3.10/dist-packages/axolotl/utils/models.py", line 17, in <module>
import transformers.modeling_utils
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 62, in <module>
from .integrations.flash_attention import flash_attention_forward
File "/usr/local/lib/python3.10/dist-packages/transformers/integrations/flash_attention.py", line 5, in <module>
from ..modeling_flash_attention_utils import _flash_attention_forward, flash_attn_supports_top_left_mask
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_flash_attention_utils.py", line 36, in <module>
from flash_attn.bert_padding import index_first_axis, pad_input, unpad_input # noqa
File "/usr/local/lib/python3.10/dist-packages/flash_attn/__init__.py", line 3, in <module>
from flash_attn.flash_attn_interface import (
File "/usr/local/lib/python3.10/dist-packages/flash_attn/flash_attn_interface.py", line 10, in <module>
import flash_attn_2_cuda as flash_attn_cuda
ImportError: /usr/local/lib/python3.10/dist-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.10/dist-packages/axolotl/cli/train.py", line 17, in <module>
from axolotl.cli.config import load_cfg
File "/usr/local/lib/python3.10/dist-packages/axolotl/cli/config.py", line 19, in <module>
from axolotl.utils.config import (
File "/usr/local/lib/python3.10/dist-packages/axolotl/utils/config/__init__.py", line 16, in <module>
from axolotl.utils.models import MULTIMODAL_AUTO_MODEL_MAPPING, load_model_config
File "/usr/local/lib/python3.10/dist-packages/axolotl/utils/models.py", line 17, in <module>
import transformers.modeling_utils
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 62, in <module>
from .integrations.flash_attention import flash_attention_forward
File "/usr/local/lib/python3.10/dist-packages/transformers/integrations/flash_attention.py", line 5, in <module>
from ..modeling_flash_attention_utils import _flash_attention_forward, flash_attn_supports_top_left_mask
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_flash_attention_utils.py", line 36, in <module>
from flash_attn.bert_padding import index_first_axis, pad_input, unpad_input # noqa
File "/usr/local/lib/python3.10/dist-packages/flash_attn/__init__.py", line 3, in <module>
from flash_attn.flash_attn_interface import (
File "/usr/local/lib/python3.10/dist-packages/flash_attn/flash_attn_interface.py", line 10, in <module>
import flash_attn_2_cuda as flash_attn_cuda
ImportError: /usr/local/lib/python3.10/dist-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.10/dist-packages/axolotl/cli/train.py", line 17, in <module>
from axolotl.cli.config import load_cfg
File "/usr/local/lib/python3.10/dist-packages/axolotl/cli/config.py", line 19, in <module>
from axolotl.utils.config import (
File "/usr/local/lib/python3.10/dist-packages/axolotl/utils/config/__init__.py", line 16, in <module>
from axolotl.utils.models import MULTIMODAL_AUTO_MODEL_MAPPING, load_model_config
File "/usr/local/lib/python3.10/dist-packages/axolotl/utils/models.py", line 17, in <module>
import transformers.modeling_utils
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 62, in <module>
from .integrations.flash_attention import flash_attention_forward
File "/usr/local/lib/python3.10/dist-packages/transformers/integrations/flash_attention.py", line 5, in <module>
from ..modeling_flash_attention_utils import _flash_attention_forward, flash_attn_supports_top_left_mask
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_flash_attention_utils.py", line 36, in <module>
from flash_attn.bert_padding import index_first_axis, pad_input, unpad_input # noqa
File "/usr/local/lib/python3.10/dist-packages/flash_attn/__init__.py", line 3, in <module>
from flash_attn.flash_attn_interface import (
File "/usr/local/lib/python3.10/dist-packages/flash_attn/flash_attn_interface.py", line 10, in <module>
import flash_attn_2_cuda as flash_attn_cuda
ImportError: /usr/local/lib/python3.10/dist-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.10/dist-packages/axolotl/cli/train.py", line 17, in <module>
from axolotl.cli.config import load_cfg
File "/usr/local/lib/python3.10/dist-packages/axolotl/cli/config.py", line 19, in <module>
from axolotl.utils.config import (
File "/usr/local/lib/python3.10/dist-packages/axolotl/utils/config/__init__.py", line 16, in <module>
from axolotl.utils.models import MULTIMODAL_AUTO_MODEL_MAPPING, load_model_config
File "/usr/local/lib/python3.10/dist-packages/axolotl/utils/models.py", line 17, in <module>
import transformers.modeling_utils
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 62, in <module>
from .integrations.flash_attention import flash_attention_forward
File "/usr/local/lib/python3.10/dist-packages/transformers/integrations/flash_attention.py", line 5, in <module>
from ..modeling_flash_attention_utils import _flash_attention_forward, flash_attn_supports_top_left_mask
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_flash_attention_utils.py", line 36, in <module>
from flash_attn.bert_padding import index_first_axis, pad_input, unpad_input # noqa
File "/usr/local/lib/python3.10/dist-packages/flash_attn/__init__.py", line 3, in <module>
from flash_attn.flash_attn_interface import (
File "/usr/local/lib/python3.10/dist-packages/flash_attn/flash_attn_interface.py", line 10, in <module>
import flash_attn_2_cuda as flash_attn_cuda
ImportError: /usr/local/lib/python3.10/dist-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.10/dist-packages/axolotl/cli/train.py", line 17, in <module>
from axolotl.cli.config import load_cfg
File "/usr/local/lib/python3.10/dist-packages/axolotl/cli/config.py", line 19, in <module>
from axolotl.utils.config import (
File "/usr/local/lib/python3.10/dist-packages/axolotl/utils/config/__init__.py", line 16, in <module>
from axolotl.utils.models import MULTIMODAL_AUTO_MODEL_MAPPING, load_model_config
File "/usr/local/lib/python3.10/dist-packages/axolotl/utils/models.py", line 17, in <module>
import transformers.modeling_utils
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 62, in <module>
from .integrations.flash_attention import flash_attention_forward
File "/usr/local/lib/python3.10/dist-packages/transformers/integrations/flash_attention.py", line 5, in <module>
from ..modeling_flash_attention_utils import _flash_attention_forward, flash_attn_supports_top_left_mask
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_flash_attention_utils.py", line 36, in <module>
from flash_attn.bert_padding import index_first_axis, pad_input, unpad_input # noqa
File "/usr/local/lib/python3.10/dist-packages/flash_attn/__init__.py", line 3, in <module>
from flash_attn.flash_attn_interface import (
File "/usr/local/lib/python3.10/dist-packages/flash_attn/flash_attn_interface.py", line 10, in <module>
import flash_attn_2_cuda as flash_attn_cuda
ImportError: /usr/local/lib/python3.10/dist-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.10/dist-packages/axolotl/cli/train.py", line 17, in <module>
from axolotl.cli.config import load_cfg
File "/usr/local/lib/python3.10/dist-packages/axolotl/cli/config.py", line 19, in <module>
from axolotl.utils.config import (
File "/usr/local/lib/python3.10/dist-packages/axolotl/utils/config/__init__.py", line 16, in <module>
from axolotl.utils.models import MULTIMODAL_AUTO_MODEL_MAPPING, load_model_config
File "/usr/local/lib/python3.10/dist-packages/axolotl/utils/models.py", line 17, in <module>
import transformers.modeling_utils
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 62, in <module>
from .integrations.flash_attention import flash_attention_forward
File "/usr/local/lib/python3.10/dist-packages/transformers/integrations/flash_attention.py", line 5, in <module>
from ..modeling_flash_attention_utils import _flash_attention_forward, flash_attn_supports_top_left_mask
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_flash_attention_utils.py", line 36, in <module>
from flash_attn.bert_padding import index_first_axis, pad_input, unpad_input # noqa
File "/usr/local/lib/python3.10/dist-packages/flash_attn/__init__.py", line 3, in <module>
from flash_attn.flash_attn_interface import (
File "/usr/local/lib/python3.10/dist-packages/flash_attn/flash_attn_interface.py", line 10, in <module>
import flash_attn_2_cuda as flash_attn_cuda
ImportError: /usr/local/lib/python3.10/dist-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.10/dist-packages/axolotl/cli/train.py", line 17, in <module>
from axolotl.cli.config import load_cfg
File "/usr/local/lib/python3.10/dist-packages/axolotl/cli/config.py", line 19, in <module>
from axolotl.utils.config import (
File "/usr/local/lib/python3.10/dist-packages/axolotl/utils/config/__init__.py", line 16, in <module>
from axolotl.utils.models import MULTIMODAL_AUTO_MODEL_MAPPING, load_model_config
File "/usr/local/lib/python3.10/dist-packages/axolotl/utils/models.py", line 17, in <module>
import transformers.modeling_utils
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 62, in <module>
from .integrations.flash_attention import flash_attention_forward
File "/usr/local/lib/python3.10/dist-packages/transformers/integrations/flash_attention.py", line 5, in <module>
from ..modeling_flash_attention_utils import _flash_attention_forward, flash_attn_supports_top_left_mask
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_flash_attention_utils.py", line 36, in <module>
from flash_attn.bert_padding import index_first_axis, pad_input, unpad_input # noqa
File "/usr/local/lib/python3.10/dist-packages/flash_attn/__init__.py", line 3, in <module>
from flash_attn.flash_attn_interface import (
File "/usr/local/lib/python3.10/dist-packages/flash_attn/flash_attn_interface.py", line 10, in <module>
import flash_attn_2_cuda as flash_attn_cuda
ImportError: /usr/local/lib/python3.10/dist-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.10/dist-packages/axolotl/cli/train.py", line 17, in <module>
from axolotl.cli.config import load_cfg
File "/usr/local/lib/python3.10/dist-packages/axolotl/cli/config.py", line 19, in <module>
from axolotl.utils.config import (
File "/usr/local/lib/python3.10/dist-packages/axolotl/utils/config/__init__.py", line 16, in <module>
from axolotl.utils.models import MULTIMODAL_AUTO_MODEL_MAPPING, load_model_config
File "/usr/local/lib/python3.10/dist-packages/axolotl/utils/models.py", line 17, in <module>
import transformers.modeling_utils
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 62, in <module>
from .integrations.flash_attention import flash_attention_forward
File "/usr/local/lib/python3.10/dist-packages/transformers/integrations/flash_attention.py", line 5, in <module>
from ..modeling_flash_attention_utils import _flash_attention_forward, flash_attn_supports_top_left_mask
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_flash_attention_utils.py", line 36, in <module>
from flash_attn.bert_padding import index_first_axis, pad_input, unpad_input # noqa
File "/usr/local/lib/python3.10/dist-packages/flash_attn/__init__.py", line 3, in <module>
from flash_attn.flash_attn_interface import (
File "/usr/local/lib/python3.10/dist-packages/flash_attn/flash_attn_interface.py", line 10, in <module>
import flash_attn_2_cuda as flash_attn_cuda
ImportError: /usr/local/lib/python3.10/dist-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
W0529 02:52:48.206000 938 torch/distributed/elastic/multiprocessing/api.py:897] Sending process 1003 closing signal SIGTERM
W0529 02:52:48.207000 938 torch/distributed/elastic/multiprocessing/api.py:897] Sending process 1004 closing signal SIGTERM
W0529 02:52:48.207000 938 torch/distributed/elastic/multiprocessing/api.py:897] Sending process 1005 closing signal SIGTERM
W0529 02:52:48.207000 938 torch/distributed/elastic/multiprocessing/api.py:897] Sending process 1006 closing signal SIGTERM
W0529 02:52:48.207000 938 torch/distributed/elastic/multiprocessing/api.py:897] Sending process 1007 closing signal SIGTERM
W0529 02:52:48.207000 938 torch/distributed/elastic/multiprocessing/api.py:897] Sending process 1009 closing signal SIGTERM
W0529 02:52:48.208000 938 torch/distributed/elastic/multiprocessing/api.py:897] Sending process 1010 closing signal SIGTERM
E0529 02:52:48.385000 938 torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 5 (pid: 1008) of binary: /usr/bin/python
Traceback (most recent call last):
File "/usr/local/bin/torchrun", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 918, in main
run(args)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 909, in run
elastic_launch(
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 138, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
axolotl.cli.train FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2025-05-29_02:52:48
host : b82c95bfc43f
rank : 5 (local_rank: 5)
exitcode : 1 (pid: 1008)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================