Loading checkpoint shards: 100%|███████████████████████████████████████| 4/4 [00:00<00:00, 31.35it/s]
0%| | 0/1000 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
/usr/local/python3.10.17/lib/python3.10/site-packages/transformers/generation/utils.py:2479: UserWarning: You are calling .generate() with the `input_ids` being on a device type different than your model's device. `input_ids` is on npu, whereas the model is on cpu. You may experience unexpected behaviors or slower generation. Please make sure that you have put `input_ids` to the correct device by calling for example input_ids = input_ids.to('cpu') before running `.generate()`.
warnings.warn(
Traceback (most recent call last):
File "/models/z50051264/vllm-0.10.0/benchmarks/benchmark_throughput.py", line 769, in <module>
main(args)
File "/models/z50051264/vllm-0.10.0/benchmarks/benchmark_throughput.py", line 449, in main
elapsed_time = run_hf(
File "/models/z50051264/vllm-0.10.0/benchmarks/benchmark_throughput.py", line 300, in run_hf
llm_outputs = llm.generate(
File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/transformers/generation/utils.py", line 2597, in generate
result = self._sample(
File "/usr/local/python3.10.17/lib/python3.10/site-packages/transformers/generation/utils.py", line 3557, in _sample
outputs = self(**model_inputs, return_dict=True)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/transformers/utils/generic.py", line 969, in wrapper
output = func(self, *args, **kwargs)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 703, in forward
outputs: BaseModelOutputWithPast = self.model(
File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/transformers/utils/generic.py", line 969, in wrapper
output = func(self, *args, **kwargs)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 405, in forward
inputs_embeds = self.embed_tokens(input_ids)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 190, in forward
return F.embedding(
File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/nn/functional.py", line 2551, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and npu:0! (when checking argument for argument indices in method wrapper_NPU__embedding)
[ERROR] 2025-07-28-08:30:59 (PID:10729, Device:0, RankID:-1) ERR99999 UNKNOWN applicaiton exception
0%| | 0/1000 [00:07<?, ?it/s]
[root@190f3c453709 benchmarks]#
分析错误原因
最新发布