retriv 搜索引擎使用教程

松忆玮

于 2024-09-03 07:47:54 发布

阅读量298

点赞数 5

CC 4.0 BY-SA版权

本文链接：https://blog.youkuaiyun.com/gitblog_00180/article/details/141839497

retriv 搜索引擎使用教程

retrivA Python Search Engine for Humans 🥸项目地址:https://gitcode.com/gh_mirrors/re/retriv

项目介绍

retriv 是一个用 Python 实现的快速搜索引擎，利用 Numba 进行高速向量操作和自动并行化。它提供了一个用户友好的界面来索引和搜索您的文档集合，并允许您自动调整底层检索模型 BM25。retriv 支持稀疏检索（传统搜索，如 BM25 和 TF-IDF）、密集检索（语义搜索）和混合检索（稀疏和密集检索的混合）。

项目快速启动

安装

首先，确保您的 Python 版本 >= 3.8。然后使用 pip 安装 retriv：

pip install retriv

最小工作示例

以下是一个简单的示例，展示如何使用 retriv 进行文档索引和搜索：

from retriv import SearchEngine

# 创建文档集合
collection = [
    {"id": "doc_1", "text": "Generals gathered in their masses"},
    {"id": "doc_2", "text": "Just like witches at black masses"},
    {"id": "doc_3", "text": "Evil minds that plot destruction"},
    {"id": "doc_4", "text": "Sorcerer of death's construction"}
]

# 初始化搜索引擎
se = SearchEngine("new-index")

# 索引文档
se.index(collection)

# 进行搜索
results = se.search("witches masses")
print(results)

输出结果：

[
    { "id": "doc_2", "text": "Just like witches at black masses", "score": 1.7536403 },
    { "id": "doc_1", "text": "Generals gathered in their masses", "score": 0.6931472 }
]