利用Weaviate实现混合搜索并成为LangChain的数据检索利器

本文链接：https://blog.youkuaiyun.com/fgayif/article/details/146466204

Weaviate是一个开源的向量数据库，支持混合搜索的强大功能。混合搜索结合了多种搜索算法，旨在提升搜索结果的准确性和相关性。通过将关键词搜索算法与向量搜索技术的优势相结合，Weaviate能够更好地理解搜索查询和文档的意义和上下文。本文将逐步介绍如何使用Weaviate的混合搜索功能，来作为LangChain的一个强大的数据检索工具。

技术背景介绍

Weaviate的混合搜索使用稀疏向量和密集向量来表示搜索查询和文档的意义。稀疏向量通常用于关键词搜索，而密集向量则用于向量搜索技术，从而实现搜索的更高精确度。

核心原理解析

混合搜索的核心在于结合两种不同的搜索策略，允许系统根据特定的查询灵活选择适合的策略。通过分析向量间的相似度，混合搜索有效地提升了搜索结果的相关性和准确性。

代码实现演示

下面是一个利用Weaviate实现混合搜索的完整示例代码，并将其集成到LangChain中。

设置检索器

首先，我们需要安装Weaviate客户端并进行基本的配置。

%pip install --upgrade --quiet weaviate-client

import os
import weaviate

# 获取Weaviate的URL及API密钥
WEAVIATE_URL = os.getenv("WEAVIATE_URL")
auth_client_secret = (weaviate.AuthApiKey(api_key=os.getenv("WEAVIATE_API_KEY")),)

# 初始化Weaviate客户端
client = weaviate.Client(
    url=WEAVIATE_URL,
    additional_headers={
        "X-Openai-Api-Key": os.getenv("OPENAI_API_KEY"),
    },
)

# 导入LangChain的检索器
from langchain_community.retrievers import WeaviateHybridSearchRetriever
from langchain_core.documents import Document

# 初始化混合搜索检索器
retriever = WeaviateHybridSearchRetriever(
    client=client,
    index_name="LangChain",
    text_key="text",
    attributes=[],
    create_schema_if_missing=True,
)

添加数据

接下来，添加一些示例文档到检索器中。

docs = [
    Document(
        metadata={
            "title": "Embracing The Future: AI Unveiled",
            "author": "Dr. Rebecca Simmons",
        },
        page_content="A comprehensive analysis of the evolution of artificial intelligence, from its inception to its future prospects. Dr. Simmons covers ethical considerations, potentials, and threats posed by AI.",
    ),
    Document(
        metadata={
            "title": "Symbiosis: Harmonizing Humans and AI",
            "author": "Prof. Jonathan K. Sterling",
        },
        page_content="Prof. Sterling explores the potential for harmonious coexistence between humans and artificial intelligence. The book discusses how AI can be integrated into society in a beneficial and non-disruptive manner.",
    ),
    # 更多文档...
]

retriever.add_documents(docs)

执行混合搜索

通过简单的搜索查询来检索数据。

results = retriever.invoke("the ethical implications of AI")
for doc in results:
    print(doc.page_content)

使用条件过滤

您还可以通过添加条件过滤来细化检索结果。

results_filtered = retriever.invoke(
    "AI integration in society",
    where_filter={
        "path": ["author"],
        "operator": "Equal",
        "valueString": "Prof. Jonathan K. Sterling",
    },
)
for doc in results_filtered:
    print(doc.page_content)

显示搜索得分

如果需要查看搜索得分，可以执行以下操作：

results_with_score = retriever.invoke(
    "AI integration in society",
    score=True,
)
for doc in results_with_score:
    print(doc.page_content, doc.metadata.get('_additional', {}).get('score'))