wow-rag——task4:最脏最累的文档管理-优快云博客

本文链接：https://blog.youkuaiyun.com/qq_26249811/article/details/146327909

文档管理意味着对保存在硬盘上的index进行增删改查。我们先来看看针对faiss向量存储的管理方式。

查看index下面的所有文档

print(index.docstore.docs)

在这里插入图片描述

查看index下面的所有node的id

print(index.index_struct.nodes_dict)

在这里插入图片描述

查看index下面所有有ref的文档的信息

print(index.ref_doc_info)

在这里插入图片描述

查看任意给定id的node详细信息

index.docstore.get_node('51595901-ebe3-48b5-b57b-dc8794ef4556')
# 或者 index.docstore.docs['51595901-ebe3-48b5-b57b-dc8794ef4556']

在这里插入图片描述

删除一个节点，删除这个操作尽量不要尝试，可能会导致后面的代码运行出错。

# index.docstore.delete_document('51595901-ebe3-48b5-b57b-dc8794ef4556')

新增节点

index.insert_nodes([doc_single])

注意这里的doc_single必须是一个 TextNode 对象。例如上文查看node时输出的那个。
TextNode 对象也可以自己构造。构造方式为：

from llama_index.core.schema import TextNode
nodes = [
    TextNode(
        text="The Shawshank Redemption",
        metadata={
            "author": "Stephen King",
            "theme": "Friendship",
            "year": 1994,
        },
    ),
    TextNode(
        text="The Godfather",
        metadata={
            "director": "Francis Ford Coppola",
            "theme": "Mafia",
            "year": 1972,
        },
    )
]
index.insert_nodes(nodes)

或者仿照前一节课的从文档构造节点的方式。

# 从指定文件读取，输入为List
from llama_index.core import SimpleDirectoryReader,Document
documents = SimpleDirectoryReader(input_files=['../docs/常用汉字大全.txt']).load_data()
from llama_index.core.node_parser import SentenceSplitter
transformations = [SentenceSplitter(chunk_size = 512)]

from llama_index.core.ingestion.pipeline import run_transformations
nodes = run_transformations(documents, transformations=transformations)
index.insert_nodes(nodes)

在这里插入图片描述