whoosh2

最新推荐文章于 2024-10-26 23:02:44 发布

原创最新推荐文章于 2024-10-26 23:02:44 发布 · 337 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#搜索

How to search

#打开searcher，search方法传人Query对象返回Results对象
from whoosh.qparser import QueryParser
qp = QueryParser("content", schema=myindex.schema)
q = qp.parse(u"hello world")
with myindex.searcher() as s:
    #结果默认包含最多10个匹配文档。
    results = s.search(q，limit=20)
    #检索结果在给定的页面 返回resultspage
    results = s.search_page(q, 5, pagelen=20)

Results 对象
是一个匹配文档的列表。可以使用获得命中文档中储存的域。
评分
whoosh.scoring模块包含多种评分算法的实现。默认是BM25F。

from whoosh import scoring

with myindex.searcher(weighting=scoring.TF_IDF()) as s:
    ...

过滤结果

with myindex.searcher() as s:
    qp = qparser.QueryParser("content", myindex.schema)
    user_q = qp.parse(query_string)

    # Only show documents in the "rendering" chapter
    allow_q = query.Term("chapter", "rendering")
    # Don't show any documents where the "tag" field contains "todo"
    restrict_q = query.Term("tag", "todo")

    results = s.search(user_q, filter=allow_q, mask=restrict_q)

解析用户query

query parser 的工作是将用户提交的查询字符串转换为query对象。
例如用户查询：rendering shading 可被解析成And([Term("content", u"rendering"), Term("content", u"shading")])
创建QueryParser对象，传入检索默认域的名和索引的schema。

from whoosh.qparser import QueryParser
parser = QueryParser("content", schema=myindex.schema)
#调用parse()去解析字符串到query对象
#And([Or([Term('content', u'alpha'), Term('content', u'beta')]), Term('content', u'gamma')])
parser.parse(u"alpha OR beta gamma")

常用定制

默认查询AND（所有词都需要出现）可改解析为OR

from whoosh import qparser

# Or查询使得包括越多查询词项的文档得分越高。例如查询foo bar，四次出现foo的文档会比包含foo和bar各一次出现的文档得分高。
parser = qparser.QueryParser(fieldname, schema=myindex.schema,
                             group=qparser.OrGroup)
// 用户经常期待包含越多词类别的文档得分高。使用factory方法去配置产生这种效果的解析。
# factory的参数是一个0-1之间的标量
og = qparser.OrGroup.factory(0.9)
parser = qparser.QueryParser(fieldname, schema, group=og)

搜索多个域

//默认单一域搜索 query：three blind mice
//content:three content:blind content:mice
parser = QueryParser("content", schema=myschema)
//多域 返回(title:three OR content:three) (title:blind OR content:blind) (title:mice OR content:mice)
from whoosh.qparser import MultifieldParser
mparser = MultifieldParser(["title", "content"], schema=myschema)