学员闯关手册:https://aicarrier.feishu.cn/wiki/QtJnweAW1iFl8LkoMKGcsUS9nld
课程视频:https://www.bilibili.com/video/BV13U1VYmEUr/
课程文档:https://github.com/InternLM/Tutorial/tree/camp4/docs/L0/Python
关卡作业:https://github.com/InternLM/Tutorial/blob/camp4/docs/L0/Python/task.md
开发机平台:https://studio.intern-ai.org.cn/
开发机平台介绍:https://aicarrier.feishu.cn/wiki/GQ1Qwxb3UiQuewk8BVLcuyiEnHe
书生浦语官网:https://internlm.intern-ai.org.cn/
github网站:https://github.com/internLM/
InternThinker: https://internlm-chat.intern-ai.org.cn/internthinker
快速上手飞书文档:https://www.feishu.cn/hc/zh-CN/articles/945900971706-%E5%BF%AB%E9%80%9F%E4%B8%8A%E6%89%8B%E6%96%87%E6%A1%A3
提交作业:https://aicarrier.feishu.cn/share/base/form/shrcnUqshYPt7MdtYRTRpkiOFJd;
作业批改结果:https://aicarrier.feishu.cn/share/base/query/shrcnkNtOS9gPPnC9skiBLlao2c
internLM-Chat 智能体:https://github.com/InternLM/InternLM/blob/main/agent/README_zh-CN.md
lagent:https://lagent.readthedocs.io/zh-cn/latest/tutorials/action.html#id2
网络搜索API:https://serper.dev/,https://serper.dev/login
茴香豆:https://github.com/InternLM/HuixiangDou/
茴香豆
茴香豆特点:
三阶段 Pipeline (前处理、拒答、响应),提高相应准确率和安全性
打通微信和飞书群聊天,适合国内知识问答场景
茴香豆是基于LLMs的RAG应用框架,
包括多源知识检索、混合大模型、多重评分拒答工作流、安全检查全链路,
web端使用: https://openxlab.org.cn/apps/detail/tpoisonooo/huixiangdou-web
自己代码部署:https://github.com/InternLM/HuixiangDou
Web 版茴香豆功能
添加/删除文档:支持 pdf、word、markdown、excel、ppt、html 和 txt 格式文件
编辑正反例
打通微信和飞书群:pip install -r requirements-lark-group.txt,教程 https://github.com/InternLM/HuixiangDou/blob/main/docs/add_lark_group_zh.md
开启网络搜索功能
聊天测试
搭建自己的 web 版茴香豆
教程:https://github.com/InternLM/HuixiangDou/blob/main/web/README.md
镜像:Cuda11.7-conda ,资源类型选择 30% A*100
#1、创建环境,下载茴香豆,安装依赖项,下载模型文件
studio-conda -o internlm-base -t huixiangdou
conda activate huixiangdou
cd /root
# 克隆代码仓库
git clone https://github.com/internlm/huixiangdou && cd huixiangdou
git checkout 79fa810
# parsing `word` format requirements
apt update
apt install python-dev libxml2-dev libxslt1-dev antiword unrtf poppler-utils pstotext tesseract-ocr flac ffmpeg lame libmad0 libsox-fmt-mp3 sox libjpeg-dev swig libpulse-dev
# python requirements
pip install BCEmbedding==0.1.5 cmake==3.30.2 lit==18.1.8 sentencepiece==0.2.0 protobuf==5.27.3 accelerate==0.33.0
pip install -r requirements.txt
# python3.8 安装 faiss-gpu 而不是 faiss
# 创建模型文件夹
cd /root && mkdir models
# 复制BCE模型
ln -s /root/share/new_models/maidalun1020/bce-embedding-base_v1 /root/models/bce-embedding-base_v1
ln -s /root/share/new_models/maidalun1020/bce-reranker-base_v1 /root/models/bce-reranker-base_v1
# 复制大模型参数(下面的模型,根据作业进度和任务进行**选择一个**就行)
ln -s /root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-7b /root/models/internlm2-chat-7b
#更改配置文件,让茴香豆使用本地模型
sed -i '9s#.*#embedding_model_path = "/root/models/bce-embedding-base_v1"#' /root/huixiangdou/config.ini
sed -i '15s#.*#reranker_model_path = "/root/models/bce-reranker-base_v1"#' /root/huixiangdou/config.ini
sed -i '43s#.*#local_llm_path = "/root/models/internlm2-chat-7b"#' /root/huixiangdou/config.ini
#2、知识库创建
conda activate huixiangdou
cd /root/huixiangdou && mkdir repodir
git clone https://github.com/internlm/huixiangdou --depth=1 repodir/huixiangdou
git clone https://github.com/open-mmlab/mmpose --depth=1 repodir/mmpose
# Save the features of repodir to workdir, and update the positive and negative example thresholds into `config.ini`
mkdir workdir
python3 -m huixiangdou.service.feature_store
#编辑正反例,正例位于 /root/huixiangdou/resource/good_questions.json 文件夹中,反例位于/root/huixiangdou/resource/bad_questions.json。
#3.1、命令行运行测试知识助手
conda activate huixiangdou
cd /root/huixiangdou
python3 -m huixiangdou.main --standalone
#3.2、Gradio UI 界面测试
#端口映射:ssh -CNg -L 7860:127.0.0.1:7860 root@ssh.intern-ai.org.cn -p <你的ssh端口号>
conda activate huixiangdou
cd /root/huixiangdou
python3 -m huixiangdou.gradio
茴香豆高阶应用
开启网络搜索
#开启网络搜索功能需要用到 Serper 提供的 API,https://serper.dev/,https://serper.dev/login,复制 API-key,替换 /huixiangdou/config.ini 中的 ${YOUR-API-KEY} 为自己的API-key
[web_search]
check https://serper.dev/api-key to get a free API key
x_api_key = "${YOUR-API-KEY}"
domain_partial_order = ["openai.com", "pytorch.org", "readthedocs.io", "nvidia.com", "stackoverflow.com", "juejin.cn", "zhuanlan.zhihu.com", "www.cnblogs.com"]
save_dir = "logs/web_search_result"
远程模型
茴香豆中有 3 处调用了模型,分别是 嵌入模型(Embedding)、重排模型(Rerank)和 大语音模型(LLM)
远程向量&重排序模型
https://siliconflow.cn/zh-cn/,https://account.siliconflow.cn/zh/login?redirect=https%3A%2F%2Fcloud.siliconflow.cn%2Faccount%2Fak%3F
将 API,填入到 /huixiangdou/config.ini 文件中 api_token 处,同时注意如图所示修改嵌入和重排模型地址(embedding_model_path, reranker_model_path)
远程大模型
enable_local = 0 # 关闭本地模型
enable_remote = 1 # 启用云端模型
多模态功能
#1、下载/更新茴香豆
conda activate huixiangdou
cd huixiangdou
git stash # 弃用之前的修改,如果需要保存,可将冲突文件另存为新文件名
git checkout main
git pull
git checkout bec2f6af9 # 支持多模态的最低版本
#2、安装多模态模型和依赖
# 设置环境变量
export HF_ENDPOINT='https://hf-mirror.com' # 使用 huggingface 中国镜像加速下载,如果在国外,忽略此步骤
# 下载模型
## 模型文件较大,如果遇到下载报错,重新运行命令就好
huggingface-cli download BAAI/bge-m3 --local-dir /root/models/bge-m3
huggingface-cli download BAAI/bge-visualized --local-dir /root/models/bge-visualized
huggingface-cli download BAAI/bge-reranker-v2-minicpm-layerwise --local-dir /root/models/bge-reranker-v2-minicpm-layerwise
# 需要手动将视觉模型移动到 BGE-m3 文件夹下
mv /root/models/bge-visualized/Visualized_m3.pth /root/models/bge-m3/
#3、安装最新的 FlagEmbedding
conda activate huixiangdou
cd /root/
# 从官方 github 安装最新版
git clone https://github.com/FlagOpen/FlagEmbedding.git
cd FlagEmbedding
pip install .
# 复制 FlagEmbedding 缺失的文件,注意 huixiangdou/lib/python3.10/site-packages 是教程开始设置的环境,如果个人有更改,需要根据自己的环境重新填入对应的地址
cp ~/FlagEmbedding/FlagEmbedding/visual/eva_clip/model_configs /root/.conda/envs/huixiangdou/lib/python3.10/site-packages/FlagEmbedding/visual/eva_clip/
cp ~/FlagEmbedding/FlagEmbedding/visual/eva_clip/bpe_simple_vocab_16e6.txt.gz /root/.conda/envs/huixiangdou/lib/python3.10/site-packages/FlagEmbedding/visual/eva_clip/
# 其他依赖包
pip install timm ftfy peft
#4、修改配置文件
sed -i '6s#.*#embedding_model_path = "/root/models/bge-m3"#' /root/huixiangdou/config-multimodal.ini
sed -i '7s#.*#reranker_model_path = "/root/models/bge-reranker-v2-minicpm-layerwise"#' /root/huixiangdou/config-multimodal.ini
sed -i '31s#.*#local_llm_path = "/root/models/internlm2-chat-7b"#' /root/huixiangdou/config-multimodal.ini
sed -i '20s#.*#enable_local = 1#' /root/huixiangdou/config-multimodal.ini
sed -i '21s#.*#enable_remote = 0#' /root/huixiangdou/config-multimodal.ini
#更改一下多模态向量知识库的位置
sed -i '8s#.*#work_dir = "workdir-multi"#' /root/huixiangdou/config-multimodal.ini
sed -i '61s#.*#enable_cr = 0#' /root/huixiangdou/config-multimodal.ini # 关闭指代消岐功能
#5、建立多模态知识库
# 新的向量知识库文件夹
mkdir workdir-multi
# 提取多模态向量知识库
python3 -m huixiangdou.service.feature_store --config_path config-multimodal.ini
#6、启动 Gradio UI 界面,试用多模态功能
conda activate huixiangdou
cd /root/huixiangdou
python3 -m huixiangdou.gradio --config_path config-multimodal.ini