(vllm-deepseek) [root@lmogpu63(15.95) ~]# CUDA_VISIBLE_DEVICES=7 swift sft --torch_dtype 'bfloat16' --model '/opt/tmp/lora/20250918_181334/model' --dataset '/opt/tmp/lora/20250918_181334/dataset/train.jsonl' --split_dataset_ratio '0.2' --max_length '2048' --use_galore 'True' --galore_rank '64' --task_type 'causal_lm' --per_device_train_batch_size '14' --per_device_eval_batch_size '28' --learning_rate '1e-4' --num_train_epochs '2' --gradient_accumulation_steps '4' --eval_steps '500' --output_dir '/opt/tmp/lora/20250918_181334/model_lora' --neftune_noise_alpha '0' --report_to 'tensorboard' --add_version False
run sh: `/opt/tpapp/conda/envs/vllm-deepseek/bin/python /opt/tpapp/conda/envs/vllm-deepseek/lib/python3.10/site-packages/swift/cli/sft.py --torch_dtype bfloat16 --model /opt/tmp/lora/20250918_181334/model --dataset /opt/tmp/lora/20250918_181334/dataset/train.jsonl --split_dataset_ratio 0.2 --max_length 2048 --use_galore True --galore_rank 64 --task_type causal_lm --per_device_train_batch_size 14 --per_device_eval_batch_size 28 --learning_rate 1e-4 --num_train_epochs 2 --gradient_accumulation_steps 4 --eval_steps 500 --output_dir /opt/tmp/lora/20250918_181334/model_lora --neftune_noise_alpha 0 --report_to tensorboard --add_version False`
[INFO:swift] Successfully registered `/opt/tpapp/conda/envs/vllm-deepseek/lib/python3.10/site-packages/swift/llm/dataset/data/dataset_info.json`.
Could not find the bitsandbytes CUDA binary at PosixPath('/opt/tpapp/conda/envs/vllm-deepseek/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda128.so')
Could not load bitsandbytes native library: /opt/tpapp/conda/envs/vllm-deepseek/bin/../lib/libstdc++.so.6: version `GLIBCXX_3.4.32' not found (required by /opt/tpapp/conda/envs/vllm-deepseek/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so)
Traceback (most recent call last):
File "/opt/tpapp/conda/envs/vllm-deepseek/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 85, in <module>
lib = get_native_library()
File "/opt/tpapp/conda/envs/vllm-deepseek/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 72, in get_native_library
dll = ct.cdll.LoadLibrary(str(binary_path))
File "/opt/tpapp/conda/envs/vllm-deepseek/lib/python3.10/ctypes/__init__.py", line 452, in LoadLibrary
return self._dlltype(name)
File "/opt/tpapp/conda/envs/vllm-deepseek/lib/python3.10/ctypes/__init__.py", line 374, in __init__
self._handle = _dlopen(self._name, mode)
OSError: /opt/tpapp/conda/envs/vllm-deepseek/bin/../lib/libstdc++.so.6: version `GLIBCXX_3.4.32' not found (required by /opt/tpapp/conda/envs/vllm-deepseek/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so)
CUDA Setup failed despite CUDA being available. Please run the following command to get more information:
python -m bitsandbytes
Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
and open an issue at: https://github.com/bitsandbytes-foundation/bitsandbytes/issues
Traceback (most recent call last):
File "/opt/tpapp/conda/envs/vllm-deepseek/lib/python3.10/site-packages/transformers/hf_argparser.py", line 258, in _add_dataclass_arguments
type_hints: dict[str, type] = get_type_hints(dtype)
File "/opt/tpapp/conda/envs/vllm-deepseek/lib/python3.10/typing.py", line 1833, in get_type_hints
value = _eval_type(value, base_globals, base_locals)
File "/opt/tpapp/conda/envs/vllm-deepseek/lib/python3.10/typing.py", line 329, in _eval_type
ev_args = tuple(_eval_type(a, globalns, localns, recursive_guard) for a in t.__args__)
File "/opt/tpapp/conda/envs/vllm-deepseek/lib/python3.10/typing.py", line 329, in <genexpr>
ev_args = tuple(_eval_type(a, globalns, localns, recursive_guard) for a in t.__args__)
File "/opt/tpapp/conda/envs/vllm-deepseek/lib/python3.10/typing.py", line 327, in _eval_type
return t._evaluate(globalns, localns, recursive_guard)
File "/opt/tpapp/conda/envs/vllm-deepseek/lib/python3.10/typing.py", line 694, in _evaluate
eval(self.__forward_code__, globalns, localns),
File "<string>", line 1, in <module>
NameError: name 'ParallelismConfig' is not defined. Did you mean: 'parallelism_config'?
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/tpapp/conda/envs/vllm-deepseek/lib/python3.10/site-packages/swift/cli/sft.py", line 10, in <module>
sft_main()
File "/opt/tpapp/conda/envs/vllm-deepseek/lib/python3.10/site-packages/swift/llm/train/sft.py", line 331, in sft_main
return SwiftSft(args).main()
File "/opt/tpapp/conda/envs/vllm-deepseek/lib/python3.10/site-packages/swift/llm/train/sft.py", line 27, in __init__
super().__init__(args)
File "/opt/tpapp/conda/envs/vllm-deepseek/lib/python3.10/site-packages/swift/llm/base.py", line 19, in __init__
self.args = self._parse_args(args)
File "/opt/tpapp/conda/envs/vllm-deepseek/lib/python3.10/site-packages/swift/llm/base.py", line 31, in _parse_args
args, remaining_argv = parse_args(self.args_class, args)
File "/opt/tpapp/conda/envs/vllm-deepseek/lib/python3.10/site-packages/swift/utils/utils.py", line 144, in parse_args
parser = HfArgumentParser([class_type])
File "/opt/tpapp/conda/envs/vllm-deepseek/lib/python3.10/site-packages/transformers/hf_argparser.py", line 143, in __init__
self._add_dataclass_arguments(dtype)
File "/opt/tpapp/conda/envs/vllm-deepseek/lib/python3.10/site-packages/transformers/hf_argparser.py", line 260, in _add_dataclass_arguments
raise RuntimeError(
RuntimeError: Type resolution failed for <class 'swift.llm.argument.train_args.TrainArguments'>. Try declaring the class in global scope or removing line of `from __future__ import annotations` which opts in Postponed Evaluation of Annotations (PEP 563)
最新发布