FlagEmbedding项目实战：基于LlamaIndex构建RAG问答系统-优快云博客

本文链接：https://blog.youkuaiyun.com/gitblog_00453/article/details/148417247

FlagEmbedding项目实战：基于LlamaIndex构建RAG问答系统

FlagEmbedding Dense Retrieval and Retrieval-augmented LLMs 项目地址: https://gitcode.com/gh_mirrors/fl/FlagEmbedding

引言

在当今大模型时代，检索增强生成(RAG)技术已成为连接大型语言模型(LLM)与专业知识库的重要桥梁。本文将介绍如何利用FlagEmbedding项目中的BGE系列嵌入模型，结合LlamaIndex框架，构建一个高效的RAG问答系统。

技术选型

FlagEmbedding项目简介

FlagEmbedding是由BAAI(北京智源人工智能研究院)开发的开源嵌入模型项目，其中的BGE(BAAI General Embedding)系列模型在MTEB等基准测试中表现优异。本文使用的bge-base-en-v1.5模型具有768维嵌入空间，在语义搜索任务中表现出色。

LlamaIndex框架优势

LlamaIndex是一个专门为LLM设计的检索框架，具有以下特点：

支持多种文档格式(PDF、HTML、Markdown等)
内置多种文本分块策略
集成主流向量数据库(FAISS、Pinecone等)
提供灵活的检索接口

系统构建步骤

1. 环境准备

首先需要安装必要的Python包：

%pip install llama-index-llms-openai llama-index-embeddings-huggingface llama-index-vector-stores-faiss
%pip install llama_index

设置OpenAI API密钥(用于GPT模型)：

import os
os.environ["OPENAI_API_KEY"] = "YOUR_API_KEY"

2. 文档加载与处理

LlamaIndex的SimpleDirectoryReader可以方便地加载本地文档：

from llama_index.core import SimpleDirectoryReader

reader = SimpleDirectoryReader("data")
documents = reader.load_data()

3. 配置核心组件

设置文本分块、嵌入模型和LLM：

from llama_index.core import Settings
from llama_index.core.node_parser import SentenceSplitter
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.openai import OpenAI

Settings.node_parser = SentenceSplitter(
    chunk_size=1000,
    chunk_overlap=150,
)

Settings.embed_model = HuggingFaceEmbedding(
    model_name="BAAI/bge-base-en-v1.5"
)

Settings.llm = OpenAI(model="gpt-4o-mini")

4. 索引构建

使用FAISS作为向量数据库：

import faiss
from llama_index.vector_stores.faiss import FaissVectorStore
from llama_index.core import StorageContext, VectorStoreIndex

# 获取嵌入维度
embedding = Settings.embed_model.get_text_embedding("Hello world")
dim = len(embedding)

# 初始化FAISS索引
faiss_index = faiss.IndexFlatL2(dim)
vector_store = FaissVectorStore(faiss_index=faiss_index)

# 构建存储上下文
storage_context = StorageContext.from_defaults(
    vector_store=vector_store
)

# 创建向量索引
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)

5. 查询引擎配置

创建基础查询引擎：

query_engine = index.as_query_engine()

定制化提示模板(可选)：

from llama_index.core import PromptTemplate

template = """
You are a Q&A chat bot.
Use the given context only, answer the question.

<context>
{context_str}
</context>

Question: {query_str}
"""

new_template = PromptTemplate(template)
query_engine.update_prompts(
    {"response_synthesizer:text_qa_template": new_template}
)

6. 系统测试

测试RAG系统的问答能力：

response = query_engine.query("What does M3-Embedding stands for?")
print(response)
# 输出：M3-Embedding stands for Multi-Linguality, Multi-Functionality, and Multi-Granularity.