书生浦语大模型实战营第四期-XTuner 微调个人小助手认知
- 教程链接:https://github.com/InternLM/Tutorial/blob/camp4/docs/L1/XTuner/README.md
- 任务链接:https://github.com/InternLM/Tutorial/blob/camp4/docs/L1/XTuner/task.md
- 提交链接:https://aicarrier.feishu.cn/share/base/form/shrcnUqshYPt7MdtYRTRpkiOFJd
任务说明
基础任务(完成此任务即完成闯关并获得 100 算力点)
- 使用 XTuner 微调 InternLM2-Chat-7B 实现自己的小助手认知,记录复现过程并截图。
进阶任务(闯关不要求完成此任务)
- 将自我认知的模型上传到 HuggingFace/Modelscope/魔乐平台,并将应用部署到 HuggingFace/Modelscope/魔乐平台
- 参与社区共建,获取浦语 api 创建自己的数据用于微调(有创意的成果有机会获得优秀学员提名)
基础任务:微调个人小助手认知
环境配置
新建文件夹L1_xtuner
,后续作业相关内容均放在这里
cd L1_xtuner
搞下环境:
conda create -n xtuner python=3.10 -y
conda activate xtuner
git clone https://github.com/InternLM/xtuner.git
cd xtuner
pip install -e '.[all]'
pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu121
pip install transformers==4.39.0
-e
表示在可编辑模式下安装项目,因此对代码所做的任何本地修改都会生效
输入xtuner list-cfg
,打印配置文件,测试是否安装正确:
微调数据准备
在L1_xtuner
下面新建文件夹data
用于保存微调的数据,也可以建立在别处统一管理,然后软连接回来
mkdir datas
在L1_xtuner
文件夹下面新建脚本xtuner_generate_assistant.py
,用于生成模拟微调数据:
import json
# 设置用户的名字
name = '1911-David'
# 设置需要重复添加的数据次数
n = 8000
# 初始化数据
data = [
{"conversation": [{"input": "请介绍一下你自己", "output": "我是{}的小助手,内在是上海AI实验室书生·浦语的7B大模型哦".format(name)}]},
{"conversation": [{"input": "你在实战营做什么", "output": "我在这里帮助{}完成XTuner微调个人小助手的任务".format(name)}]}
]
# 通过循环,将初始化的对话数据重复添加到data列表中
for i in range(n):
data.append(data[0])
data.append(data[1])
# 将data列表中的数据写入到'datas/assistant.json'文件中
with open('datas/assistant.json', 'w', encoding='utf-8') as f:
# 使用json.dump方法将数据以JSON格式写入文件
# ensure_ascii=False 确保中文字符正常显示
# indent=4 使得文件内容格式化,便于阅读
json.dump(data, f, ensure_ascii=False, indent=4)
运行脚本,查看一下结果:
ok,这就是要用来进行微调的数据了,具体到自己的项目需要自己做数据清洗等等工作了,这个其实才是比较重要的部分。
基础模型准备
将internlm2_5-7b-chat
下载好,并软链接到L1_xtuner
目录下面,这个在入门岛都搞过了:
修改Config
L1_xtuner
文件夹下面新建config
文件夹,先复制一个原来的配置文件:
mkdir ./config
cd config
xtuner copy-cfg internlm2_5_chat_7b_qlora_alpaca_e3 ./
按照教程修改配置文件中的关于模型和数据集的配置,我的改完是这个样子:
# Copyright (c) OpenMMLab. All rights reserved.
import torch
from datasets import load_dataset
from mmengine.dataset import DefaultSampler
from mmengine.hooks import (CheckpointHook, DistSamplerSeedHook, IterTimerHook,
LoggerHook, ParamSchedulerHook)
from mmengine.optim import AmpOptimWrapper, CosineAnnealingLR, LinearLR
from peft import LoraConfig
from torch.optim import AdamW
from transformers import (AutoModelForCausalLM, AutoTokenizer,
BitsAndBytesConfig)
from xtuner.dataset import process_hf_dataset
from xtuner.dataset.collate_fns import default_collate_fn
from xtuner.dataset.map_fns import alpaca_map_fn, template_map_fn_factory
from xtuner.engine.hooks import (DatasetInfoHook, EvaluateChatHook,
VarlenAttnArgsToMessageHubHook)
from xtuner.engine.runner import TrainLoop
from xtuner.model import SupervisedFinetune
from xtuner.parallel.sequence import SequenceParallelSampler
from xtuner.utils import PROMPT_TEMPLATE, SYSTEM_TEMPLATE
#######################################################################
# PART 1 Settings #
#######################################################################
# Model
pretrained_model_name_or_path = 'internlm2_5-7b-chat'
use_varlen_attn = False
# Data
# alpaca_en_path = 'tatsu-lab/alpaca'
alpaca_en_path = 'datas/assistant.json'
prompt_template = PROMPT_TEMPLATE.internlm2_chat
max_length = 2048
pack_to_max_length = True
# parallel
sequence_parallel_size = 1
# Scheduler & Optimizer
batch_size = 1 # per_device
accumulative_counts = 1
accumulative_counts *= sequence_parallel_size
dataloader_num_workers = 0
max_epochs = 3
optim_type = AdamW
lr = 2e-4
betas = (0.9, 0.999)
weight_decay = 0
max_norm = 1 # grad clip
warmup_ratio = 0.03
# Save
save_steps = 500
save_total_limit = 2 # Maximum checkpoints to keep (-1 means unlimited)
# Evaluate the generation performance during the training
evaluation_freq = 500
SYSTEM = SYSTEM_TEMPLATE.alpaca
evaluation_inputs = [
# '请给我介绍五个上海的景点', 'Please tell me five scenic spots in Shanghai'
'请介绍下你自己','Please introduce yourself'
]
#######################################################################
# PART 2 Model & Tokenizer #
#######################################################################
tokenizer = dict(
type=AutoTokenizer.from_pretrained,
pretrained_model_name_or_path=pretrained_model_name_or_path,
trust_remote_code=True,
padding_side='right')
model = dict(
type=SupervisedFinetune,
use_varlen_attn=use_varlen_attn,
llm=dict(
type=AutoModelForCausalLM.from_pretrained,
pretrained_model_name_or_path=pretrained_model_name_or_path,
trust_remote_code=True,
torch_dtype=torch.float16,
quantization_config=dict(
type=BitsAndBytesConfig,
load_in_4bit=True,
load_in_8bit=False,
llm_int8_threshold=6.0,
llm_int8_has_fp16_weight=False,
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type='nf4')),
lora=dict(
type=LoraConfig,
r=64,
lora_alpha=16,
lora_dropout=0.1,
bias='none',
task_type='CAUSAL_LM'))
#######################################################################
# PART 3 Dataset & Dataloader #
#######################################################################
alpaca_en = dict(
type=process_hf_dataset,
# dataset=dict(type=load_dataset, path=alpaca_en_path),
dataset=dict(
type=load_dataset,
path='json',
data_files=dict(train=alpaca_en_path)
),
tokenizer=tokenizer,
max_length=max_length,
# dataset_map_fn=alpaca_map_fn,
dataset_map_fn=None,
template_map_fn=dict(
type=template_map_fn_factory, template=prompt_template),
remove_unused_columns=True,
shuffle_before_pack=True,
pack_to_max_length=pack_to_max_length,
use_varlen_attn=use_varlen_attn)
sampler = SequenceParallelSampler \
if sequence_parallel_size > 1 else DefaultSampler
train_dataloader = dict(
batch_size=batch_size,
num_workers=dataloader_num_workers,
dataset=alpaca_en,
sampler=dict(type=sampler, shuffle=True),
collate_fn=dict(type=default_collate_fn, use_varlen_attn=use_varlen_attn))
#######################################################################
# PART 4 Scheduler & Optimizer #
#######################################################################
# optimizer
optim_wrapper = dict(
type=AmpOptimWrapper,
optimizer=dict(
type=optim_type, lr=lr, betas=betas, weight_decay=weight_decay),
clip_grad=dict(max_norm=max_norm, error_if_nonfinite=False),
accumulative_counts=accumulative_counts,
loss_scale='dynamic',
dtype='float16')
# learning policy
# More information: https://github.com/open-mmlab/mmengine/blob/main/docs/en/tutorials/param_scheduler.md # noqa: E501
param_scheduler = [
dict(
type=LinearLR,
start_factor=1e-5,
by_epoch=True,
begin=0,
end=warmup_ratio * max_epochs,
convert_to_iter_based=True),
dict(
type=CosineAnnealingLR,
eta_min=0.0,
by_epoch=True,
begin=warmup_ratio * max_epochs,
end=max_epochs,
convert_to_iter_based=True)
]
# train, val, test setting
train_cfg = dict(type=TrainLoop, max_epochs=max_epochs)
#######################################################################
# PART 5 Runtime #
#######################################################################
# Log the dialogue periodically during the training process, optional
custom_hooks = [
dict(type=DatasetInfoHook, tokenizer=tokenizer),
dict(
type=EvaluateChatHook,
tokenizer=tokenizer,
every_n_iters=evaluation_freq,
evaluation_inputs=evaluation_inputs,
system=SYSTEM,
prompt_template=prompt_template)
]
if use_varlen_attn:
custom_hooks += [dict(type=VarlenAttnArgsToMessageHubHook)]
# configure default hooks
default_hooks = dict(
# record the time of every iteration.
timer=dict(type=IterTimerHook),
# print log every 10 iterations.
logger=dict(type=LoggerHook, log_metric_by_epoch=False, interval=10),
# enable the parameter scheduler.
param_scheduler=dict(type=ParamSchedulerHook),
# save checkpoint per `save_steps`.
checkpoint=dict(
type=CheckpointHook,
by_epoch=False,
interval=save_steps,
max_keep_ckpts=save_total_limit),
# set sampler seed in distributed evrionment.
sampler_seed=dict(type=DistSamplerSeedHook),
)
# configure environment
env_cfg = dict(
# whether to enable cudnn benchmark
cudnn_benchmark=False,
# set multi process parameters
mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
# set distributed parameters
dist_cfg=dict(backend='nccl'),
)
# set visualizer
visualizer = None
# set log level
log_level = 'INFO'
# load from which checkpoint
load_from = None
# whether to resume training from the loaded checkpoint
resume = False
# Defaults to use random seed and disable `deterministic`
randomness = dict(seed=None, deterministic=False)
# set log processor
log_processor = dict(by_epoch=False)
启动微调
写个脚本bash run_xtuner_test.sh
,贴入下述内容,方便后续修改(运行需要在L1_xtuner
文件夹下面哦)
xtuner train config/internlm2_5_chat_7b_qlora_alpaca_e3_copy.py \
--deepspeed deepspeed_zero2 \
--work-dir work_dirs/internlm2_chat_7b_qlora_alpaca_e3_copy
执行bash run_xtuner_test.sh
,开始训练喽:
看下显存占用:
配置文件里写了max_epochs = 3
,A10大概40多分钟就训练完了
根据训练时候的测试输出,貌似是训好了
权重转换
模型转换的本质其实就是将原本使用 Pytorch 训练出来的模型权重文件转换为目前通用的 HuggingFace 格式文件,那么我们可以通过以下命令来实现一键转换。
我们可以使用 xtuner convert pth_to_hf
命令来进行模型格式转换。
xtuner convert pth_to_hf
命令用于进行模型格式转换。该命令需要三个参数:CONFIG
表示微调的配置文件,PATH_TO_PTH_MODEL
表示微调的模型权重文件路径,即要转换的模型权重,SAVE_PATH_TO_HF_MODEL
表示转换后的 HuggingFace 格式文件的保存路径。
除此之外,我们其实还可以在转换的命令中添加几个额外的参数,包括:
参数名 | 解释 |
---|---|
–fp32 | 代表以fp32的精度开启,假如不输入则默认为fp16 |
–max-shard-size {GB} | 代表每个权重文件最大的大小(默认为2GB) |
参考教程按照下述命令转换下模型即可,写成脚本run_xtuner_convert.sh
conda activate xtuner
# 先获取最后保存的一个pth文件
pth_file=`ls -t work_dirs/internlm2_chat_7b_qlora_alpaca_e3_copy/*.pth | head -n 1`
export MKL_SERVICE_FORCE_INTEL=1
export MKL_THREADING_LAYER=GNU
xtuner convert pth_to_hf ./config/internlm2_5_chat_7b_qlora_alpaca_e3_copy.py ${pth_file} ./work_dirs/internlm2_chat_7b_qlora_alpaca_e3_copy/hf
运行结果如下:
模型合并
对于 LoRA 或者 QLoRA 微调出来的模型其实并不是一个完整的模型,而是一个额外的层(Adapter),训练完的这个层最终还是要与原模型进行合并才能被正常的使用。
对于全量微调的模型(full)其实是不需要进行整合这一步的,因为全量微调修改的是原模型的权重而非微调一个新的 Adapter ,因此是不需要进行模型整合的。
在 XTuner 中提供了一键合并的命令 xtuner convert merge
,在使用前我们需要准备好三个路径,包括原模型的路径、训练好的 Adapter 层的(模型格式转换后的)路径以及最终保存的路径。
xtuner convert merge
命令用于合并模型。该命令需要三个参数:LLM
表示原模型路径,ADAPTER
表示 Adapter 层的路径,SAVE_PATH
表示合并后的模型最终的保存路径。
在模型合并这一步还有其他很多的可选参数,包括:
参数名 | 解释 |
---|---|
–max-shard-size {GB} | 代表每个权重文件最大的大小(默认为2GB) |
–device {device_name} | 这里指的就是device的名称,可选择的有cuda、cpu和auto,默认为cuda即使用gpu进行运算 |
–is-clip | 这个参数主要用于确定模型是不是CLIP模型,假如是的话就要加上,不是就不需要添加 |
参考教程按照下述命令搞下模型合并即可,把下面内容贴入脚本run_xtuner_merge.sh
:
export MKL_SERVICE_FORCE_INTEL=1
export MKL_THREADING_LAYER=GNU
xtuner convert merge internlm2_5-7b-chat work_dirs/internlm2_chat_7b_qlora_alpaca_e3_copy/hf ./work_dirs/internlm2_chat_7b_qlora_alpaca_e3_copy/merged --max-shard-size 2GB
在模型合并完成后,我们就可以看到最终的模型和原模型文件夹非常相似,包括了分词器、权重文件、配置信息等等。
### 检查效果/模型 WebUI 对话
新建python脚本xtuner_streamlit_demo.py1,贴入下述内容
:
import copy
import warnings
from dataclasses import asdict, dataclass
from typing import Callable, List, Optional
import streamlit as st
import torch
from torch import nn
from transformers.generation.utils import (LogitsProcessorList,
StoppingCriteriaList)
from transformers.utils import logging
from transformers import AutoTokenizer, AutoModelForCausalLM # isort: skip
logger = logging.get_logger(__name__)
model_name_or_path = "internlm2_5-7b-chat"
@dataclass
class GenerationConfig:
# this config is used for chat to provide more diversity
max_length: int = 2048
top_p: float = 0.75
temperature: float = 0.1
do_sample: bool = True
repetition_penalty: float = 1.000
@torch.inference_mode()
def generate_interactive(
model,
tokenizer,
prompt,
generation_config: Optional[GenerationConfig] = None,
logits_processor: Optional[LogitsProcessorList] = None,
stopping_criteria: Optional[StoppingCriteriaList] = None,
prefix_allowed_tokens_fn: Optional[Callable[[int, torch.Tensor],
List[int]]] = None,
additional_eos_token_id: Optional[int] = None,
**kwargs,
):
inputs = tokenizer([prompt], padding=True, return_tensors='pt')
input_length = len(inputs['input_ids'][0])
for k, v in inputs.items():
inputs[k] = v.cuda()
input_ids = inputs['input_ids']
_, input_ids_seq_length = input_ids.shape[0], input_ids.shape[-1]
if generation_config is None:
generation_config = model.generation_config
generation_config = copy.deepcopy(generation_config)
model_kwargs = generation_config.update(**kwargs)
bos_token_id, eos_token_id = ( # noqa: F841 # pylint: disable=W0612
generation_config.bos_token_id,
generation_config.eos_token_id,
)
if isinstance(eos_token_id, int):
eos_token_id = [eos_token_id]
if additional_eos_token_id is not None:
eos_token_id.append(additional_eos_token_id)
has_default_max_length = kwargs.get(
'max_length') is None and generation_config.max_length is not None
if has_default_max_length and generation_config.max_new_tokens is None:
warnings.warn(
f"Using 'max_length''s default ({repr(generation_config.max_length)}) \
to control the generation length. "
'This behaviour is deprecated and will be removed from the \
config in v5 of Transformers -- we'
' recommend using `max_new_tokens` to control the maximum \
length of the generation.',
UserWarning,
)
elif generation_config.max_new_tokens is not None:
generation_config.max_length = generation_config.max_new_tokens + \
input_ids_seq_length
if not has_default_max_length:
logger.warn( # pylint: disable=W4902
f"Both 'max_new_tokens' (={generation_config.max_new_tokens}) "
f"and 'max_length'(={generation_config.max_length}) seem to "
"have been set. 'max_new_tokens' will take precedence. "
'Please refer to the documentation for more information. '
'(https://huggingface.co/docs/transformers/main/'
'en/main_classes/text_generation)',
UserWarning,
)
if input_ids_seq_length >= generation_config.max_length:
input_ids_string = 'input_ids'
logger.warning(
f"Input length of {input_ids_string} is {input_ids_seq_length}, "
f"but 'max_length' is set to {generation_config.max_length}. "
'This can lead to unexpected behavior. You should consider'
" increasing 'max_new_tokens'.")
# 2. Set generation parameters if not already defined
logits_processor = logits_processor if logits_processor is not None \
else LogitsProcessorList()
stopping_criteria = stopping_criteria if stopping_criteria is not None \
else StoppingCriteriaList()
logits_processor = model._get_logits_processor(
generation_config=generation_config,
input_ids_seq_length=input_ids_seq_length,
encoder_input_ids=input_ids,
prefix_allowed_tokens_fn=prefix_allowed_tokens_fn,
logits_processor=logits_processor,
)
stopping_criteria = model._get_stopping_criteria(
generation_config=generation_config,
stopping_criteria=stopping_criteria)
logits_warper = model._get_logits_warper(generation_config)
unfinished_sequences = input_ids.new(input_ids.shape[0]).fill_(1)
scores = None
while True:
model_inputs = model.prepare_inputs_for_generation(
input_ids, **model_kwargs)
# forward pass to get next token
outputs = model(
**model_inputs,
return_dict=True,
output_attentions=False,
output_hidden_states=False,
)
next_token_logits = outputs.logits[:, -1, :]
# pre-process distribution
next_token_scores = logits_processor(input_ids, next_token_logits)
next_token_scores = logits_warper(input_ids, next_token_scores)
# sample
probs = nn.functional.softmax(next_token_scores, dim=-1)
if generation_config.do_sample:
next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
else:
next_tokens = torch.argmax(probs, dim=-1)
# update generated ids, model inputs, and length for next step
input_ids = torch.cat([input_ids, next_tokens[:, None]], dim=-1)
model_kwargs = model._update_model_kwargs_for_generation(
outputs, model_kwargs, is_encoder_decoder=False)
unfinished_sequences = unfinished_sequences.mul(
(min(next_tokens != i for i in eos_token_id)).long())
output_token_ids = input_ids[0].cpu().tolist()
output_token_ids = output_token_ids[input_length:]
for each_eos_token_id in eos_token_id:
if output_token_ids[-1] == each_eos_token_id:
output_token_ids = output_token_ids[:-1]
response = tokenizer.decode(output_token_ids)
yield response
# stop when each sentence is finished
# or if we exceed the maximum length
if unfinished_sequences.max() == 0 or stopping_criteria(
input_ids, scores):
break
def on_btn_click():
del st.session_state.messages
@st.cache_resource
def load_model():
model = (AutoModelForCausalLM.from_pretrained(model_name_or_path,
trust_remote_code=True).to(
torch.bfloat16).cuda())
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path,
trust_remote_code=True)
return model, tokenizer
def prepare_generation_config():
with st.sidebar:
max_length = st.slider('Max Length',
min_value=8,
max_value=32768,
value=2048)
top_p = st.slider('Top P', 0.0, 1.0, 0.75, step=0.01)
temperature = st.slider('Temperature', 0.0, 1.0, 0.1, step=0.01)
st.button('Clear Chat History', on_click=on_btn_click)
generation_config = GenerationConfig(max_length=max_length,
top_p=top_p,
temperature=temperature)
return generation_config
user_prompt = '<|im_start|>user\n{user}<|im_end|>\n'
robot_prompt = '<|im_start|>assistant\n{robot}<|im_end|>\n'
cur_query_prompt = '<|im_start|>user\n{user}<|im_end|>\n\
<|im_start|>assistant\n'
def combine_history(prompt):
messages = st.session_state.messages
meta_instruction = ('')
total_prompt = f"<s><|im_start|>system\n{meta_instruction}<|im_end|>\n"
for message in messages:
cur_content = message['content']
if message['role'] == 'user':
cur_prompt = user_prompt.format(user=cur_content)
elif message['role'] == 'robot':
cur_prompt = robot_prompt.format(robot=cur_content)
else:
raise RuntimeError
total_prompt += cur_prompt
total_prompt = total_prompt + cur_query_prompt.format(user=prompt)
return total_prompt
def main():
# torch.cuda.empty_cache()
print('load model begin.')
model, tokenizer = load_model()
print('load model end.')
st.title('InternLM2-Chat-1.8B')
generation_config = prepare_generation_config()
# Initialize chat history
if 'messages' not in st.session_state:
st.session_state.messages = []
# Display chat messages from history on app rerun
for message in st.session_state.messages:
with st.chat_message(message['role'], avatar=message.get('avatar')):
st.markdown(message['content'])
# Accept user input
if prompt := st.chat_input('What is up?'):
# Display user message in chat message container
with st.chat_message('user'):
st.markdown(prompt)
real_prompt = combine_history(prompt)
# Add user message to chat history
st.session_state.messages.append({
'role': 'user',
'content': prompt,
})
with st.chat_message('robot'):
message_placeholder = st.empty()
for cur_response in generate_interactive(
model=model,
tokenizer=tokenizer,
prompt=real_prompt,
additional_eos_token_id=92542,
**asdict(generation_config),
):
# Display robot response in chat message container
message_placeholder.markdown(cur_response + '▌')
message_placeholder.markdown(cur_response)
# Add robot response to chat history
st.session_state.messages.append({
'role': 'robot',
'content': cur_response, # pylint: disable=undefined-loop-variable
})
torch.cuda.empty_cache()
if __name__ == '__main__':
main()
先测试下原始模型steamlit run xtuner_steamlit_demo.py
,效果如下:
然后把模型路径换成我们之前微调并做完合并的模型路径,即work_dirs/internlm2_chat_7b_qlora_alpaca_e3_copy/merged
,
重新运行,并测试一下,哈哈,搞定:
任务基本上完成了,但是由于训练语料过于单一,所以模型明显是过拟合了的~
模型上传&应用部署
modelscope模型上传
之前在入门岛搞过了huggingface模型的上传,比较费流量,这次弄下modelscope,看下教程:
首先在个人主页搞个访问令牌:
然后在SDK中使用token登录:
from modelscope.hub.api import HubApi
YOUR_ACCESS_TOKEN = '请从ModelScope个人中心->访问令牌获取'
api = HubApi()
api.login(YOUR_ACCESS_TOKEN)
创建模型库,可以使用SDK,也可以直接在网页点开模型库手动创建,这里不详细描述了
搞好之后,先把模型库下载到本地,我这里是Assitant_of_David_based_on_InternLM
,把上面那个merged之后的模型复制过来:
然后按照教程写个脚本上传即可:
from modelscope.hub.api import HubApi
YOUR_ACCESS_TOKEN = '请从ModelScope个人中心->访问令牌获取'
api = HubApi()
api.login(YOUR_ACCESS_TOKEN)
# 上传模型
api.push_model(
model_id="yourname/your_model_id", # 如果model_id对应的模型库不存在,将会自动创建
model_dir="my_model_dir" # 指定本地模型目录,目录中必须包含configuration.json文件
)
然后就开始上传了,我这里比较慢,可能是文件太大了:
这是上传完后的模型:
modelscope应用部署
参考创空间构建即可,或者使用模型一键部署,需要授权阿里云,这个还要付费:
效果有待优化,先这样吧~
浦语API调用
文档主页:https://internlm.intern-ai.org.cn/api/document
文档里有很详细的例子:
不过需要提前创建好API Tokens,在这里:
把下面的脚本换成上面生成的自己的api key即可:
import requests
import json
url = 'https://internlm-chat.intern-ai.org.cn/puyu/api/v1/chat/completions'
header = {
'Content-Type':'application/json',
"Authorization":"Bearer eyJ0eXBlIjoiSl...请填写准确的 token!"
}
data = {
"model": "internlm2.5-latest",
"messages": [{
"role": "user",
"content": "你好~"
}],
"n": 1,
"temperature": 0.8,
"top_p": 0.9
}
res = requests.post(url, headers=header, data=json.dumps(data))
print(res.status_code)
print(res.json())
print(res.json()["data"]["choices"][0]["content"])