简单使用LlamaIndex实现RAG

最新推荐文章于 2025-11-21 11:51:21 发布

原创最新推荐文章于 2025-11-21 11:51:21 发布 · 654 阅读

9 ·

CC 4.0 BY-SA版权

文章标签：

#python #机器学习

机器学习同时被 3 个专栏收录

33 篇文章

订阅专栏

python

16 篇文章

订阅专栏

大模型

2 篇文章

订阅专栏

简单使用LlamaIndex实现RAG

1 介绍

LlamaIndex是一个专门为大语言模型（LLM）设计的开源数据管理工具，旨在简化和优化LLM在外部数据源中的查询过程。适合在数据索引上构建RAG。

参考的地址

# 官网地址
https://docs.llamaindex.ai/en/stable/

# 模块介绍
https://docs.llamaindex.ai/en/stable/module_guides/

# Github地址
https://github.com/run-llama/llama_index

使用的组件

# Openai like
https://docs.llamaindex.ai/en/stable/api_reference/llms/openai_like/
# OpenLLM，我没测试组件，它继承了OpenAILike
https://docs.llamaindex.ai/en/stable/api_reference/llms/openllm/

# 自定义嵌入模型
https://docs.llamaindex.ai/en/stable/module_guides/models/embeddings/#custom-embedding-model

# 自定义LLM模型
https://docs.llamaindex.ai/en/stable/module_guides/models/llms/usage_custom/

# ChromaVectorStore向量存储
https://docs.llamaindex.ai/en/stable/api_reference/storage/vector_store/chroma/#llama_index.vector_stores.chroma.ChromaVectorStore

# Chroma数据库文档
https://docs.trychroma.com/docs/overview/introduction

需要安装的包

⚠️ 我使用的llma-index的版本：0.12.26

pip install llama-index
pip install llama-index-llms-openai-like
pip install llama-index-vector-stores-chroma
pip install chromadb

2 使用官网构建RAG

⚠️ 国内基本无法直接使用，因为需要OpenAI的模型，所以无法直接使用。那就需要根据自己的需求定制。

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

# 加载数据
documents = SimpleDirectoryReader("E:/data").load_data()
# 转化文档
index = VectorStoreIndex.from_documents(documents)
# 构建查询引擎
query_engine = index.as_query_engine()
# 使用
response = query_engine.query("Some question about the data should go here")
print(response)

3 自定义构建RAG

3.1 RAG构建思路

使用LlamaIndex构建RAG的思路如下图，LlamaIndex需要自定义向量模型和类大模型组件。

graph TD
	A[（1）构建Documet对象列表，读数据文档] --> B
	B[（2）构建Node对象列表，使用分割器分割Document，其中分割器有SentenceSplitter、TextSplitter等] --> C
	C[（3）向量化和存储，自定义嵌入模型和存储到数据库中，可以使用SimpleVectorStore、ChromaVectorStore等] --> D
	D[（4）构建向量索引库，使用VectorStoreIndex构建向量索引] --> E
	E[（5）构建检索器，用于用户检索输入的prompt] --> F
	F[（6）构建响应生成器，自定义大模型生成用户输入的prompt] --> G
	G[（7）构建查询引擎，组合检索器和响应生成器构建查询引擎] --> H
	H[（8）使用prompt查询和生成数据]

在这里插入图片描述

3.2 自定义RAG

（1） my_document_custorm_engine.py

import chromadb
from llama_index.core import Document, VectorStoreIndex, get_response_synthesizer
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.response_synthesizers import ResponseMode
from llama_index.vector_stores.chroma import ChromaVectorStore

from my_custom_rag.custom_like_openai import CustomLikeOpenAI

# 需要安装的组件
"""
pip install llama-index
pip install llama-index-llms-openai-like
pip install llama-index-vector-stores-chroma
"""

from my_custom_rag.custom_embedding import CustomEmbeddings

# 自定嵌入模型
my_embedding = CustomEmbeddings()

# 构建文档列表，可根据自己的需求设置文本文档列表
document_text = [
    "河南大学（Henan University），简称“河大”（HENU），位于中国河南省郑州市、开封市，是河南省人民政府与中华人民共和国教育部共建公办高校 [392]、国家“双一流”建设高校 [69]，入选国家“111计划” [2]、中西部高校基础能力建设工程 [313]、中国政府奖学金来华留学生接收院校 [388]。",
    "河南大学创立于1912年，始名河南留学欧美预备学校。后历经中州大学、国立开封中山大学、省立河南大学等阶段，1942年升格为国立河南大学 [153]。1952年院系调整 ，校本部更名为河南师范学院。后经开封师范学院、河南师范大学等阶段，1984年恢复河南大学校名 [153]。2000年6月，原河南大学、开封医学高等专科学校、开封师范高等专科学校合并组建新的河南大学 [154]。2012年，河南大学入选第一批卓越医生教育培养计划项目试点高校 [130]；入选国家级卓越法律人才教育培养基地 [390]；入选第一批国家卓越医生教育培养计划项目试点高校 [391]。",
    "截至2024年6月，学校设有40个学院、93个本科招生专业 、47个硕士学位授权一级学科 、39种硕士专业学位授权类别 、2种博士专业学位授权类别、24个博士学位授权一级学科 、20个博士后科研流动站、13个学科进入ESI世界排名前1% ；有全日制在校生5万余人、教职工4700余人，教师中有院士、学部委员6人，长江学者、国家杰青、“万人计划”领军人才等国家级领军人才26人，国家级青年人才15人；拥有3个国家重点实验室 ，1个国家野外科学观测研究站 ，3个国家地方联合工程研究中心 ，4个河南省实验室 ， 5个教育部和农业部重点实验室 [448]。",
    "河南大学软件学院（Henan University Software College）是全国较早、河南省最早成立的软件学院之一，位于中国河南省开封市。学院是国家示范性软件学院联盟成员单位，信息技术新工科产学研联盟首批会员单位，2020 年度获批河南省特色化示范性软件学院，河南省鲲鹏产业学院建设高校。学院设有软件工程系、网络工程系和公共计算机教学中心，拥有“河南省智能数据处理工程研究中心”、“河南省现代网络技术实验教学示范中心”、“河南省智能网络理论与关键技术国际联合实验室”等省级科研教学平台、“河南省高等学校学科（软件工程）引智基地”、“河南省本科高校大学生校外实践教育基地”和“河南省高校优秀基层教学组织”称号。学院独立承建软件工程和网络工程两个本科专业，软件工程专业为国家一流本科专业， 网络工程专业为河南省一流本科专业。同时，拥有“软件工程技术”二级博士学位授权点，电子信息专业具有硕士学位授予权，建有软件工程博士后科研流动站。"
]

# 构建LlamaIndex的对象列表
documents = list()
for text in document_text:
    documents.append(Document(text=text))
# print(documents)

# 将LlamaIndex的Document拆分Nodes
parser = SentenceSplitter()
nodes = parser.get_nodes_from_documents(documents)

# 使用嵌入模型对node进行嵌入
for node in nodes:
    node.embedding = my_embedding.get_text_embedding(node.get_content())

# 创建Chroma客户端
client = chromadb.Client()

# 创建集合
collection = client.create_collection("my-documents")
chroma_vector_store = ChromaVectorStore(chroma_collection=collection)

# 存入向量库中
chroma_vector_store.add(nodes)

# 使用索引对象
# 注意：VectorStoreIndex必须有向量库支持，否则会报下面的错误
# Cannot initialize from a vector store that does not store text.
vector_index = VectorStoreIndex.from_vector_store(chroma_vector_store, embed_model=my_embedding)


# Llama index中的OpenAILike调用千问和Kimi不能使用，一直报参数错误，所以只能自定义了
llm = CustomLikeOpenAI(
    model="qwen2.5-14b-instruct",
    api_base="https://dashscope.aliyuncs.com/compatible-mode/v1",
    api_key = "sk-XXX"
)

# 下面的是LlamaIndex中的OpenAILike
# llm = OpenAILike(
#     model="qwen2.5-14b-instruct",
#     base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
#     api_key = "sk-XXX"
# )

"""
# 显示构建查询引擎=检索器+响应生成器

# 创建检索器，similarity_top_k设置返回值的数量
query_retriever = vector_index.as_retriever(similarity_top_k=1)

# 响应生成器有compact（默认模式）、refine、simple_summarize等
response_synthesizer = get_response_synthesizer(
    llm=llm,
    response_mode=ResponseMode.COMPACT,
    # streaming=True
)

query_engine = RetrieverQueryEngine(
    retriever=query_retriever,
    response_synthesizer=response_synthesizer
)

"""
# 隐式构建查询引擎，上面两步可以使用1行构建
query_engine = vector_index.as_query_engine(
    llm=llm,
    # streaming=True
)

# 如果不使用流式输出，直接打印即可
query_data = query_engine.query("用30字介绍一下河南大学")
print(query_data)

# # 使用流式输出
# for text in query_data.response_gen:
#     print(text)
#     pass

（2）custom_embedding.py

自定义嵌入模型

from llama_index.core.base.embeddings.base import Embedding
from llama_index.core.embeddings import BaseEmbedding
from sentence_transformers import SentenceTransformer

# 构建向量模型，可根据自己的需求，自定义调用互联网和本地模型等
embedder = SentenceTransformer("E:/model/sentencetransformers/distiluse-base-multilingual-cased-v1")


class CustomEmbeddings(BaseEmbedding):
    def _get_query_embedding(self, query: str) -> Embedding:
        # 生成嵌入列表
        return embedder.encode(query).tolist()

    async def _aget_query_embedding(self, query: str) -> Embedding:
        # 生成嵌入列表
        return embedder.encode(query).tolist()

    def _get_text_embedding(self, text: str) -> Embedding:
        # 生成嵌入列表
        return embedder.encode(text).tolist()

（3）custom_like_openai.py

自定义大模型

from typing import Any

from llama_index.core.base.llms.types import CompletionResponseGen, LLMMetadata, CompletionResponse
from llama_index.core.llms import CustomLLM
from openai import OpenAI
from pydantic import Field


class CustomLikeOpenAI(CustomLLM):
    model: str = Field(description="自定义模型名称")
    api_key: str = Field(description="自定义API Key")
    api_base: str = Field(description="自定义API地址")
    context_window: int = Field(default=32768, description="上下文窗口大小")
    temperature: float = Field(ge=0, le=1, default=0.3, description="设置温度，值域须为 [0, 1]")
    num_output: int = Field(default=8192, description="设置max_tokens")

    def __init__(self, **data):
        # 必须调用父类初始化
        super().__init__(**data)

        # 创建对象
        self._client = OpenAI(
            api_key=self.api_key,
            base_url=self.api_base
        )

    @property
    def metadata(self) -> LLMMetadata:
        """Get LLM metadata."""
        return LLMMetadata(
            context_window=self.context_window,
            num_output=self.num_output,
            model_name=self.model
        )

    def complete(self, prompt: str, **kwargs: Any) -> CompletionResponse:
        """
        生成文本

        :param prompt: 添加提示词
        :param kwargs: 其他相关参数
        :return: CompletionResponse
        """
        # 构建生成
        completion = self._client.chat.completions.create(
            model=self.model,
            messages=[
                {"role": "user", "content": prompt}
            ],
            temperature=self.temperature,
            max_tokens=self.num_output
        )
        # 返回值
        return CompletionResponse(text=completion.choices[0].message.content)

    def stream_complete(self, prompt: str, **kwargs: Any) -> CompletionResponseGen:
        """
        生成流式文本

        :param prompt: 提示词
        :param kwargs: 其他参数
        :return: CompletionResponseGen迭代器
        """

        # 根据需要可以不实现，如果不想实现使用下面代码即可
        # raise NotImplementedError("Streaming not supported")

        # 构建数据流
        stream = self._client.chat.completions.create(
            model=self.model,
            messages=[
                {"role": "user", "content": prompt}
            ],
            temperature=self.temperature,
            max_tokens=self.num_output,
            stream=True
        )
        # 遍历数据流
        for chunk in stream:
            # 获取新文本
            delta = chunk.choices[0].delta

            # 判断数据是否存在
            if delta.content:
                yield CompletionResponse(text=delta.content, delta=delta.content)

3.3 其他学习

（1）使用SimpleVectorStore存储

from llama_index.core import Document
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.vector_stores import SimpleVectorStore, VectorStoreQuery
from sentence_transformers import SentenceTransformer

# 构建向量模型
embedder = SentenceTransformer("E:/model/sentencetransformers/distiluse-base-multilingual-cased-v1")

# 构建文档列表，可根据自己的需求设置文本文档列表
document_text = [
    "A man is eating food.",
    "A man is eating a piece of bread.",
    "The girl is carrying a baby.",
    "A man is riding a horse.",
    "A woman is playing violin.",
    "Two men pushed carts through the woods.",
    "A man is riding a white horse on an enclosed ground.",
    "A monkey is playing drums.",
    "A cheetah is running behind its prey.",
]

# 构建LlamaIndex的对象列表
documents = list()
for text in document_text:
    documents.append(Document(text=text))
# print(documents)

# 将LlamaIndex的Document拆分Nodes
parser = SentenceSplitter()
nodes = parser.get_nodes_from_documents(documents)

# 使用嵌入模型对node进行嵌入
for node in nodes:
    node.embedding = embedder.encode(node.get_content())
# print(nodes)

# 下面一般不会用于生产环境，生产环境一般用向量库
# 基于内存的方式存储向量
simple_vector_store = SimpleVectorStore()
simple_vector_store.add(nodes)

# 持久化nodes，默认存储在”./storage“
# simple_vector_store.persist()
# 获取持久化数据
# simple_vector_store = SimpleVectorStore.from_persist_path("./storage/vector_store.json")

# # 对查询的文本进行嵌入
query_embed = embedder.encode("A man is eating pasta").tolist()
# 查询到的目标数据
target_data_embed = simple_vector_store.query(VectorStoreQuery(query_embedding=query_embed, similarity_top_k=2))

print(target_data_embed)

（2）使用索引VectorStoreIndex

import chromadb
from llama_index.core import Document, VectorStoreIndex
from llama_index.core.node_parser import SentenceSplitter
from llama_index.vector_stores.chroma import ChromaVectorStore

from my_custom_rag.custom_embedding import CustomEmbeddings

# 自定嵌入模型
my_embedding = CustomEmbeddings()

# 构建文档列表，可根据自己的需求设置文本文档列表
document_text = [
    "A man is eating food.",
    "A man is eating a piece of bread.",
    "The girl is carrying a baby.",
    "A man is riding a horse.",
    "A woman is playing violin.",
    "Two men pushed carts through the woods.",
    "A man is riding a white horse on an enclosed ground.",
    "A monkey is playing drums.",
    "A cheetah is running behind its prey.",
]

# 构建LlamaIndex的对象列表
documents = list()
for text in document_text:
    documents.append(Document(text=text))
# print(documents)

# 将LlamaIndex的Document拆分Nodes
parser = SentenceSplitter()
nodes = parser.get_nodes_from_documents(documents)

# 使用嵌入模型对node进行嵌入
for node in nodes:
    node.embedding = my_embedding.get_text_embedding(node.get_content())

# 创建Chroma客户端
client = chromadb.Client()

# 创建集合
collection = client.create_collection("my-documents")
chroma_vector_store = ChromaVectorStore(chroma_collection=collection)

# 存入向量库中
chroma_vector_store.add(nodes)


# 使用索引对象
# 注意：VectorStoreIndex必须有向量库支持，否则会报下面的错误
# Cannot initialize from a vector store that does not store text.
vector_index = VectorStoreIndex.from_vector_store(chroma_vector_store, embed_model=my_embedding)

# 创建检索器，similarity_top_k设置返回值的数量
query_retriever = vector_index.as_retriever(similarity_top_k=1)

# 查询数据
query_text = "A man is eating pasta"
retrieve_data = query_retriever.retrieve(query_text)

print(retrieve_data)