专题4:混合检索
检索代码如下所示:
from pymilvus import AnnSearchRequest
#### STEP1:构造检索模块request_1 和request_2
## 假设稠密数据
query_dense_vector = [0.3580376395471989, -0.6023495712049978, 0.18414012509913835, -0.26286205330961354, 0.9029438446296592]
search_param_1 = {
"data": [query_dense_vector],
"anns_field": "dense",
"param": {
"metric_type": "IP",
"params": {"nprobe": 10}
},
"limit": 2
}
request_1 = AnnSearchRequest(**search_param_1)
## 假设稀疏数据
query_sparse_vector = {3573: 0.34701499565746674}, {5263: 0.2639375518635271}
search_param_2 = {
"data": [query_sparse_vector],
"anns_field": "sparse",
"param": {
"metric_type": "IP",
"params": {}
},
"limit": 2
}
request_2 = AnnSearchRequest(**search_param_2)
reqs = [request_1, request_2]
#### STEP2:利用检索模块融合进混合检索
from pymilvus import MilvusClient
res = client.hybrid_search(
collection_name="hybrid_search_collection",
reqs=reqs,
ranker=ranker,
limit=2
)
for hits in res:
print("TopK results:")
for hit in hits:
print(hit)
Rerankers策略主要有两种方法如下:
方法1:rrf加权策略
文档d在所有检索器的排名倒数和
from pymilvus import RRFRanker
ranker = RRFRanker(100)
方法2:对不同的检索空间的document先取交集,然后对score进行加权策略,策略如下:
图片来自于最佳实践论文
from pymilvus import WeightedRanker
rerank= WeightedRanker(0.8, 0.3)