```
[root@80101f8eab9f Medusa]# ll -h /models/z50051264/medusa-1.0-zephyr-7b-beta
total 15G
-rwxr-x--- 1 root root 42 Aug 6 02:54 added_tokens.json
-rwxr-x--- 1 root root 688 Aug 6 02:54 config.json
-rwxr-x--- 1 root root 111 Aug 6 02:54 generation_config.json
-rwxr-x--- 1 root root 1.5K Aug 6 02:54 gitattributes
-rwxr-x--- 1 root root 9.3G Aug 6 02:52 pytorch_model-00001-of-00002.bin
-rwxr-x--- 1 root root 5.7G Aug 6 02:49 pytorch_model-00002-of-00002.bin
-rwxr-x--- 1 root root 25K Aug 6 02:54 pytorch_model.bin.index.json
-rwxr-x--- 1 root root 168 Aug 6 02:54 special_tokens_map.json
-rwxr-x--- 1 root root 482K Aug 6 02:54 tokenizer.model
-rwxr-x--- 1 root root 1.7K Aug 6 02:54 tokenizer_config.json
[root@80101f8eab9f Medusa]# python -m medusa.inference.cli --model /models/z50051264/medusa-1.0-zephyr-7b-beta
/usr/local/python3.10.17/lib/python3.10/site-packages/torch_npu/contrib/transfer_to_npu.py:292: ImportWarning:
*************************************************************************************************************
The torch.Tensor.cuda and torch.nn.Module.cuda are replaced with torch.Tensor.npu and torch.nn.Module.npu now..
The torch.cuda.DoubleTensor is replaced with torch.npu.FloatTensor cause the double type is not supported now..
The backend in torch.distributed.init_process_group set to hccl now..
The torch.cuda.* and torch.cuda.amp.* are replaced with torch.npu.* and torch.npu.amp.* now..
The device parameters have been replaced with npu in the function below:
torch.logspace, torch.randint, torch.hann_window, torch.rand, torch.full_like, torch.ones_like, torch.rand_like, torch.randperm, torch.arange, torch.frombuffer, torch.normal, torch._empty_per_channel_affine_quantized, torch.empty_strided, torch.empty_like, torch.scalar_tensor, torch.tril_indices, torch.bartlett_window, torch.ones, torch.sparse_coo_tensor, torch.randn, torch.kaiser_window, torch.tensor, torch.triu_indices, torch.as_tensor, torch.zeros, torch.randint_like, torch.full, torch.eye, torch._sparse_csr_tensor_unsafe, torch.empty, torch._sparse_coo_tensor_unsafe, torch.blackman_window, torch.zeros_like, torch.range, torch.sparse_csr_tensor, torch.randn_like, torch.from_file, torch._cudnn_init_dropout_state, torch._empty_affine_quantized, torch.linspace, torch.hamming_window, torch.empty_quantized, torch._pin_memory, torch.autocast, torch.load, torch.Generator, torch.set_default_device, torch.Tensor.new_empty, torch.Tensor.new_empty_strided, torch.Tensor.new_full, torch.Tensor.new_ones, torch.Tensor.new_tensor, torch.Tensor.new_zeros, torch.Tensor.to, torch.Tensor.pin_memory, torch.nn.Module.to, torch.nn.Module.to_empty
*************************************************************************************************************
warnings.warn(msg, ImportWarning)
/usr/local/python3.10.17/lib/python3.10/site-packages/torch_npu/contrib/transfer_to_npu.py:247: RuntimeWarning: torch.jit.script and torch.jit.script_method will be disabled by transfer_to_npu, which currently does not support them, if you need to enable them, please do not use transfer_to_npu.
warnings.warn(msg, RuntimeWarning)
MistralForCausalLM has generative capabilities, as `prepare_inputs_for_generation` is explicitly defined. However, it doesn't directly inherit from `GenerationMixin`. From 👉v4.50👈 onwards, `PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.
- If you're using `trust_remote_code=True`, you can get rid of this warning by loading the model with an auto class. See https://huggingface.co/docs/transformers/en/model_doc/auto#auto-classes
- If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception).
- If you are not the owner of the model architecture class, please contact the model code owner to update it.
MedusaModelMistral has generative capabilities, as `prepare_inputs_for_generation` is explicitly defined. However, it doesn't directly inherit from `GenerationMixin`. From 👉v4.50👈 onwards, `PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.
- If you're using `trust_remote_code=True`, you can get rid of this warning by loading the model with an auto class. See https://huggingface.co/docs/transformers/en/model_doc/auto#auto-classes
- If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception).
- If you are not the owner of the model architecture class, please contact the model code owner to update it.
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
You are using a model of type mistral to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
Traceback (most recent call last):
File "/models/z50051264/Medusa/medusa/model/medusa_model.py", line 134, in from_pretrained
return super().from_pretrained(
File "/usr/local/python3.10.17/lib/python3.10/site-packages/transformers/modeling_utils.py", line 309, in _wrapper
return func(*args, **kwargs)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4574, in from_pretrained
) = cls._load_pretrained_model(
File "/usr/local/python3.10.17/lib/python3.10/site-packages/transformers/modeling_utils.py", line 5020, in _load_pretrained_model
state_dict = load_state_dict(
File "/usr/local/python3.10.17/lib/python3.10/site-packages/transformers/modeling_utils.py", line 554, in load_state_dict
check_torch_load_is_safe()
File "/usr/local/python3.10.17/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1417, in check_torch_load_is_safe
raise ValueError(
ValueError: Due to a serious vulnerability issue in `torch.load`, even with `weights_only=True`, we now require users to upgrade torch to at least v2.6 in order to use the function. This version restriction does not apply when loading files with safetensors.
See the vulnerability report here https://nvd.nist.gov/vuln/detail/CVE-2025-32434
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/python3.10.17/lib/python3.10/site-packages/urllib3/connection.py", line 198, in _new_conn
sock = connection.create_connection(
File "/usr/local/python3.10.17/lib/python3.10/site-packages/urllib3/util/connection.py", line 85, in create_connection
raise err
File "/usr/local/python3.10.17/lib/python3.10/site-packages/urllib3/util/connection.py", line 73, in create_connection
sock.connect(sa)
TimeoutError: timed out
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/python3.10.17/lib/python3.10/site-packages/urllib3/connectionpool.py", line 787, in urlopen
response = self._make_request(
File "/usr/local/python3.10.17/lib/python3.10/site-packages/urllib3/connectionpool.py", line 488, in _make_request
raise new_e
File "/usr/local/python3.10.17/lib/python3.10/site-packages/urllib3/connectionpool.py", line 464, in _make_request
self._validate_conn(conn)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/urllib3/connectionpool.py", line 1093, in _validate_conn
conn.connect()
File "/usr/local/python3.10.17/lib/python3.10/site-packages/urllib3/connection.py", line 704, in connect
self.sock = sock = self._new_conn()
File "/usr/local/python3.10.17/lib/python3.10/site-packages/urllib3/connection.py", line 207, in _new_conn
raise ConnectTimeoutError(
urllib3.exceptions.ConnectTimeoutError: (<urllib3.connection.HTTPSConnection object at 0xfffc9cba7400>, 'Connection to huggingface.co timed out. (connect timeout=10)')
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/python3.10.17/lib/python3.10/site-packages/requests/adapters.py", line 667, in send
resp = conn.urlopen(
File "/usr/local/python3.10.17/lib/python3.10/site-packages/urllib3/connectionpool.py", line 841, in urlopen
retries = retries.increment(
File "/usr/local/python3.10.17/lib/python3.10/site-packages/urllib3/util/retry.py", line 519, in increment
raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /lmsys/vicuna-7b-v1.3/resolve/main/config.json (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0xfffc9cba7400>, 'Connection to huggingface.co timed out. (connect timeout=10)'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/python3.10.17/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1533, in _get_metadata_or_catch_error
metadata = get_hf_file_metadata(
File "/usr/local/python3.10.17/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1450, in get_hf_file_metadata
r = _request_wrapper(
File "/usr/local/python3.10.17/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 286, in _request_wrapper
response = _request_wrapper(
File "/usr/local/python3.10.17/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 309, in _request_wrapper
response = http_backoff(method=method, url=url, **params, retry_on_exceptions=(), retry_on_status_codes=(429,))
File "/usr/local/python3.10.17/lib/python3.10/site-packages/huggingface_hub/utils/_http.py", line 310, in http_backoff
response = session.request(method=method, url=url, **kwargs)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/huggingface_hub/utils/_http.py", line 96, in send
return super().send(request, *args, **kwargs)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/requests/adapters.py", line 688, in send
raise ConnectTimeout(e, request=request)
requests.exceptions.ConnectTimeout: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /lmsys/vicuna-7b-v1.3/resolve/main/config.json (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0xfffc9cba7400>, 'Connection to huggingface.co timed out. (connect timeout=10)'))"), '(Request ID: a3fce9e4-f367-4359-b08d-f05c0687e591)')
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/python3.10.17/lib/python3.10/site-packages/transformers/utils/hub.py", line 470, in cached_files
hf_hub_download(
File "/usr/local/python3.10.17/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1008, in hf_hub_download
return _hf_hub_download_to_cache_dir(
File "/usr/local/python3.10.17/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1115, in _hf_hub_download_to_cache_dir
_raise_on_head_call_error(head_call_error, force_download, local_files_only)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1648, in _raise_on_head_call_error
raise LocalEntryNotFoundError(
huggingface_hub.errors.LocalEntryNotFoundError: An error happened while trying to locate the file on the Hub and we cannot find the requested files in the local cache. Please check your connection and try again or make sure your Internet connection is on.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/python3.10.17/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/local/python3.10.17/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/models/z50051264/Medusa/medusa/inference/cli.py", line 228, in <module>
main(args)
File "/models/z50051264/Medusa/medusa/inference/cli.py", line 39, in main
model = MedusaModel.from_pretrained(
File "/models/z50051264/Medusa/medusa/model/medusa_model.py", line 403, in from_pretrained
return MedusaModelMistral.from_pretrained(
File "/models/z50051264/Medusa/medusa/model/medusa_model.py", line 142, in from_pretrained
base_model_config = AutoConfig.from_pretrained(config.base_model_name_or_path)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1153, in from_pretrained
config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/transformers/configuration_utils.py", line 595, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/transformers/configuration_utils.py", line 654, in _get_config_dict
resolved_config_file = cached_file(
File "/usr/local/python3.10.17/lib/python3.10/site-packages/transformers/utils/hub.py", line 312, in cached_file
file = cached_files(path_or_repo_id=path_or_repo_id, filenames=[filename], **kwargs)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/transformers/utils/hub.py", line 543, in cached_files
raise OSError(
OSError: We couldn't connect to 'https://huggingface.co' to load the files, and couldn't find them in the cached files.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.
[ERROR] 2025-08-06-03:25:59 (PID:2566, Device:-1, RankID:-1) ERR99999 UNKNOWN applicaiton exception
/usr/local/python3.10.17/lib/python3.10/tempfile.py:869: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/tmp/tmpac5jw6b6'>
_warnings.warn(warn_message, ResourceWarning)
[root@80101f8eab9f Medusa]#
```
分析报错原因