我已经下载了tiktoken和protobuf库,D:\PythonProject\deepseekai.venv\Scripts\python.exe D:\PythonProject\deepseekai\train_weather_model.py
PyTorch 版本: 2.3.1+cu118
CUDA 可用: True
GPU 名称: NVIDIA GeForce GTX 1650 Ti
You are using the default legacy behaviour of the <class ‘transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast’>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
Traceback (most recent call last):
File “D:\PythonProject\deepseekai.venv\Lib\site-packages\transformers\convert_slow_tokenizer.py”, line 1737, in convert_slow_tokenizer
).converted()
^^^^^^^^^^^
File “D:\PythonProject\deepseekai.venv\Lib\site-packages\transformers\convert_slow_tokenizer.py”, line 1631, in converted
tokenizer = self.tokenizer()
^^^^^^^^^^^^^^^^
File “D:\PythonProject\deepseekai.venv\Lib\site-packages\transformers\convert_slow_tokenizer.py”, line 1624, in tokenizer
vocab_scores, merges = self.extract_vocab_merges_from_model(self.vocab_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “D:\PythonProject\deepseekai.venv\Lib\site-packages\transformers\convert_slow_tokenizer.py”, line 1600, in extract_vocab_merges_from_model
bpe_ranks = load_tiktoken_bpe(tiktoken_url)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “D:\PythonProject\deepseekai.venv\Lib\site-packages\tiktoken\load.py”, line 148, in load_tiktoken_bpe
contents = read_file_cached(tiktoken_bpe_file, expected_hash)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “D:\PythonProject\deepseekai.venv\Lib\site-packages\tiktoken\load.py”, line 48, in read_file_cached
cache_key = hashlib.sha1(blobpath.encode()).hexdigest()
^^^^^^^^^^^^^^^
AttributeError: ‘NoneType’ object has no attribute ‘encode’
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “D:\PythonProject\deepseekai\train_weather_model.py”, line 31, in <module>
tokenizer = AutoTokenizer.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “D:\PythonProject\deepseekai.venv\Lib\site-packages\transformers\models\auto\tokenization_auto.py”, line 1032, in from_pretrained
return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “D:\PythonProject\deepseekai.venv\Lib\site-packages\transformers\tokenization_utils_base.py”, line 2025, in from_pretrained
return cls._from_pretrained(
^^^^^^^^^^^^^^^^^^^^^
File “D:\PythonProject\deepseekai.venv\Lib\site-packages\transformers\tokenization_utils_base.py”, line 2278, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “D:\PythonProject\deepseekai.venv\Lib\site-packages\transformers\models\llama\tokenization_llama_fast.py”, line 154, in init
super().init(
File “D:\PythonProject\deepseekai.venv\Lib\site-packages\transformers\tokenization_utils_fast.py”, line 139, in init
fast_tokenizer = convert_slow_tokenizer(self, from_tiktoken=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “D:\PythonProject\deepseekai.venv\Lib\site-packages\transformers\convert_slow_tokenizer.py”, line 1739, in convert_slow_tokenizer
raise ValueError(
ValueError: Converting from SentencePiece and Tiktoken failed, if a converter for SentencePiece is available, provide a model path with a SentencePiece tokenizer.model file.Currently available slow->fast converters: [‘AlbertTokenizer’, ‘BartTokenizer’, ‘BarthezTokenizer’, ‘BertTokenizer’, ‘BigBirdTokenizer’, ‘BlenderbotTokenizer’, ‘CamembertTokenizer’, ‘CLIPTokenizer’, ‘CodeGenTokenizer’, ‘ConvBertTokenizer’, ‘DebertaTokenizer’, ‘DebertaV2Tokenizer’, ‘DistilBertTokenizer’, ‘DPRReaderTokenizer’, ‘DPRQuestionEncoderTokenizer’, ‘DPRContextEncoderTokenizer’, ‘ElectraTokenizer’, ‘FNetTokenizer’, ‘FunnelTokenizer’, ‘GPT2Tokenizer’, ‘HerbertTokenizer’, ‘LayoutLMTokenizer’, ‘LayoutLMv2Tokenizer’, ‘LayoutLMv3Tokenizer’, ‘LayoutXLMTokenizer’, ‘LongformerTokenizer’, ‘LEDTokenizer’, ‘LxmertTokenizer’, ‘MarkupLMTokenizer’, ‘MBartTokenizer’, ‘MBart50Tokenizer’, ‘MPNetTokenizer’, ‘MobileBertTokenizer’, ‘MvpTokenizer’, ‘NllbTokenizer’, ‘OpenAIGPTTokenizer’, ‘PegasusTokenizer’, ‘Qwen2Tokenizer’, ‘RealmTokenizer’, ‘ReformerTokenizer’, ‘RemBertTokenizer’, ‘RetriBertTokenizer’, ‘RobertaTokenizer’, ‘RoFormerTokenizer’, ‘SeamlessM4TTokenizer’, ‘SqueezeBertTokenizer’, ‘T5Tokenizer’, ‘UdopTokenizer’, ‘WhisperTokenizer’, ‘XLMRobertaTokenizer’, ‘XLNetTokenizer’, ‘SplinterTokenizer’, ‘XGLMTokenizer’, ‘LlamaTokenizer’, ‘CodeLlamaTokenizer’, ‘GemmaTokenizer’, ‘Phi3Tokenizer’]
Process finished with exit code 1
最新发布