LangChain4j-RAG高级-检索增强器

最新推荐文章于 2025-07-10 10:19:51 发布

原创

最新推荐文章于 2025-07-10 10:19:51 发布 · 2k 阅读

17 ·

CC 4.0 BY-SA版权

文章标签：

#LangChan4j #AI-Agent实现 #人工智能 #大模型应用

Retrieval Augmentor 检索增强器

RetrievalAugmentor 是 RAG 管道的入口点。它负责使用从各种来源检索的相关 Content 来扩充 ChatMessage 。

可以在创建 AiService 期间指定 RetrievalAugmentor 的实例：

Assistant assistant = AiServices.builder(Assistant.class)
    ...
    .retrievalAugmentor(retrievalAugmentor)
    .build();

每次调用 AiService 时，都会调用指定的 RetrievalAugmentor 来扩充当前的 UserMessage 。

可以使用LangChain4j中提供的 RetrievalAugmentor 的默认实现(DefaultRetrievalAugmentor) 或实现自定义实现。

Default Retrieval Augmentor 默认检索增强器

LangChain4j 提供了RetrievalAugmentor接口的现成实现： DefaultRetrievalAugmentor ，它应该适合大多数 RAG 用例。

官方使用示例:

public class _04_Advanced_RAG_with_Metadata_Example {

    /**
     * Please refer to {@link Naive_RAG_Example} for a basic context.
     * <p>
     * Advanced RAG in LangChain4j is described here: https://github.com/langchain4j/langchain4j/pull/538
     * <p>
     * This example illustrates how to include document source and other metadata into the LLM prompt.
     */

    public static void main(String[] args) {

        Assistant assistant = createAssistant("documents/miles-of-smiles-terms-of-use.txt");

        // Ask "What is the name of the file where cancellation policy is defined?".
        // Observe how "file_name" metadata entry was injected into the prompt.
        startConversationWith(assistant);
    }

    private static Assistant createAssistant(String documentPath) {

        Document document = loadDocument(toPath(documentPath), new TextDocumentParser());

        EmbeddingModel embeddingModel = new BgeSmallEnV15QuantizedEmbeddingModel();

        EmbeddingStore<TextSegment> embeddingStore = new InMemoryEmbeddingStore<>();

        EmbeddingStoreIngestor ingestor = EmbeddingStoreIngestor.builder()
                .documentSplitter(DocumentSplitters.recursive(300, 0))
                .embeddingModel(embeddingModel)
                .embeddingStore(embeddingStore)
                .build();

        ingestor.ingest(document);

        ContentRetriever contentRetriever = EmbeddingStoreContentRetriever.builder()
                .embeddingStore(embeddingStore)
                .embeddingModel(embeddingModel)
                .build();

        // Each retrieved segment should inc