(fastchat) D:\code\transformers-main>python src/transformers/models/llama/convert_llama_weights_to_hf.py --input_dir D:/code/model/LLaMA --model_size 7B --output_dir D:/code/model/transformer_model_7b
Fetching all parameters from the checkpoint at D:/code/model/LLaMA\7B.
Loading the checkpoint in a Llama model.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████|33/33 [00:04<00:00, 7.76it/s]
Saving in the Transformers format.
Fetching the tokenizer from D:/code/model/LLaMA\tokenizer.model.
(fastchat) D:\code\transformers-main>
defgenerate_stream(model, tokenizer, params, device, context_len=2048, stream_interval=2):
prompt = params["prompt"]
l_prompt =len(prompt)
temperature =float(params.get("temperature",1.0))
max_new_tokens =int(params.get("max_new_tokens",256))
stop_str = params.get("stop",None)
input_ids = tokenizer(prompt).input_ids
output_ids =list(input_ids)
max_src_len = context_len - max_new_tokens -8
input_ids = input_ids[-max_src_len:]for i inrange(max_new_tokens):if i ==0:
out = model(
torch.as_tensor([input_ids], device=device), use_cache=True)
logits = out.logits
past_key_values = out.past_key_values
else:
attention_mask = torch.ones(1, past_key_values[0][0].shape[-2]+1, device=device)
out = model(input_ids=torch.as_tensor([[token]], device=device),
use_cache=True,
attention_mask=attention_mask,
past_key_values=past_key_values)
logits = out.logits
past_key_values = out.past_key_values
last_token_logits = logits[0][-1]if device =="mps":# Switch to CPU by avoiding some bugs in mps backend.
last_token_logits = last_token_logits.float().to("cpu")if temperature <1e-4:
token =int(torch.argmax(last_token_logits))else:
probs = torch.softmax(last_token_logits / temperature, dim=-1)
token =int(torch.multinomial(probs, num_samples=1))
output_ids.append(token)if token == tokenizer.eos_token_id:
stopped =Trueelse:
stopped =Falseif i % stream_interval ==0or i == max_new_tokens -1or stopped:
output = tokenizer.decode(output_ids, skip_special_tokens=True)
pos = output.rfind(stop_str, l_prompt)if pos !=-1:
output = output[:pos]
stopped =Trueyield output
if stopped:breakdel past_key_values
给出prompt,让AI进行回复
params ={"prompt":"你好!",# 你发出的话"temperature":0.7,# 模型输出的随机程度"max_new_tokens":100,# AI最大回复的语言长度"stop":"###"}iter= generate_stream(model, tokenizer, params,'cpu',context_len=2048, stream_interval=2)
skip_echo_len =len(params["prompt"])+1
pre =0for outputs initer:
outputs = outputs[skip_echo_len:].strip()
outputs = outputs.split(" ")
now =len(outputs)if now -1> pre:print(" ".join(outputs[pre:now-1]), end=" ", flush=True)
pre = now -1print(" ".join(outputs[pre:]), flush=True)
下面是个控制台输出的样例截图
安装过程中的异常汇总
问题1
出现下面异常提示
Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
python convert_llama_weights_to_hf.py --input_dir D:\Xunlei\LLaMA7\LLaMAOriginalWeights\LLaMA --model_size 7B --output_dir D:\fastChat\transformer_model_7b
1. Downgrade the protobuf package to 3.20.x or lower.
2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).
大佬好,我在生成FastChat对应的模型Vicuna时,最开始报错信息是:We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like lmsys/vicuna-7b-delta-v1.1 is not the path to a directory containing a file named config.json.
后来我又重试了几次,报错信息是:ProxyError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /lmsys/vicuna-7b-delta-v1.1/resolve/main/tokenizer_config.json (Caused by ProxyError('Cannot connect to proxy.', OSError(0,
'Error')))
方便帮忙看一下吗?
就是在提供一个Web GUI这里,启动web server之后会提供一个访问地址为http://127.0.0.1:7860,然后它有一个提示“To create a public link, set `share=True` in `launch()`.”。我是在服务器上弄的,使用127.0.0.1肯定访问不到的,需要一个外部链接去访问它
您好,
我在转换的时候它的输出没有Fetching the tokenizer from D:/code/model/LLaMA\tokenizer.model.
而是Saving a LlamaTokenizerFast to D:/Model/transformer_model_7b.
请问该怎么解决
大佬,后面生产vicuna时出现以下错误
OSError: Not found:
"C:\Users\阿宇不想努力/.cache\huggingface\hub\models--lmsys--vicuna-7b-delta-v1.1\snapshots\981921c2f3815acee666973b05620bc7a4\tokenizer.model": No such file or directory Error #2
OSError: Unable to load vocabulary from file. Please check that the provided vocabulary is accessible and not corrupted.[face]emoji:029.png[/face]
您好,微调时这个异常怎么处理呢?
ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as `pad_token`
`(tokenizer.pad_token = tokenizer.eos_token e.g.)` or add a new pad token via `tokenizer.add_special_tokens({'pad_token':
'[PAD]'})`