一、启用大模型服务
nohup ./llama-server -m models/Qwen2-0.5B-Instruct/Qwen2-0.5B-Instruct-F16.gguf --host 0.0.0.0 --port 2001 > log.txt 2>&1 &
二、openai调用
from openai import OpenAI
client = OpenAI(api_key='xx', base_url='http://localhost:2001/v1')
completion = client.chat.completions.create(
model='qwen2',
messages=[{'role': 'user', 'content': '为什么天空是蓝色的'}],
stream=False
)
print(completion.choices[-1].message.content)
三、性能测试
import time
for i in range(5):
start_time = time.time()
text = test_ollama()
end_time = time.time()
print(f"第{i+1}次调用:{end_time-start_time}秒, token/s:{len(text)/(end_time-start_time)}")
参考链接:
1、https://mp.weixin.qq.com/s/majDONtuAUzN2SAaYWxH1Q
2、https://mp.weixin.qq.com/s/YuTHDfEzK8wV33Bifubc5A
3、https://mp.weixin.qq.com/s/9hUkDiEVM6mehkaHxU6VVw