Lucene的排序修改

修改Similarity(相似度计算)<o:p></o:p>

DefaultSimilarity基本上可以满足一般的搜索要求。但是在有些应用中,你可以定制你自己的Similarity来服务你自己的应用需求。例如:有些人认为没有必要让文档短的文章得分更高一点 (参考 a "fair" similarity).<o:p></o:p>

修改Similarity需要同时对索引和搜索都进行修改,必须在搜索或者排序之间修改Similarity

要定制你自己的Similarity,也就是你不想直接使用DefaultSimilarity,你只要在建立索引的之前调用IndexWriter.setSimilarity,或者在搜索之前调用Searcher.setSimilarity.

你如果想知道,别人都是怎么修改similarity的,你可以参考一下Lucene的邮件列表Overriding Similarity. 总的来说有下面这些修改:

  1. SweetSpotSimilarity -- SweetSpotSimilarity gives small increases as the frequency increases a small amount and then greater increases when you hit the "sweet spot", i.e. where you think the frequency of terms is more significant.
  2. Overriding tf -- In some applications, it doesn't matter what the score of a document is as long as a matching term occurs. In these cases people have overridden Similarity to return 1 from the tf() method.
  3. Changing Length Normalization -- By overriding lengthNorm, it is possible to discount how the length of a field contributes to a score. In DefaultSimilarity, lengthNorm = 1 / (numTerms in field)^0.5, but if one changes this to be 1 / (numTerms in field), all fields will be treated "fairly".

因为你对你自己的数据更了解,所以你有必要重写自己的Similarity方法。<o:p></o:p>

定制你自己的评分系统(专家级)<o:p></o:p>

修改评分系统是专家级的工作,所以你要谨慎工作,随时和别人交流。在Lucene中,修改评分系统将比修改similarity更加能够影响结果。Lucene的评分系统是一个非常复杂的机制,主要由下面三个类来实现: <o:p></o:p>

  1. Query -- The abstract object representation of the user's information need.
  2. Weight -- The internal interface representation of the user's Query, so that Query objects may be reused.
  3. Scorer -- An abstract class containing common functionality for scoring. Provides both scoring and explanation capabilities.

下面我来具体介绍一下这三个类:

The Query Class<o:p></o:p>

从某种意义上来说,Query是评分开始的地方。没有查询就没有什么可以评分的。更重要的是它是其他的评分系统的催化剂,由它来生成其他的评分系统,然后将他们整合起来。Query有一些重要的方法需要被继承:

  1. createWeight(Searcher searcher) -- A Weight is the internal representation of the Query, so each Query implementation must provide an implementation of Weight. See the subsection on The Weight Interface below for details on implementing the Weight interface.
  2. rewrite(IndexReader reader) -- Rewrites queries into primitive queries. Primitive queries are: TermQuery, BooleanQuery, OTHERS????

The Weight Interface<o:p></o:p>

Weight 接口<o:p></o:p>

权重接口主要用来定义Query的一个代表实现接口,所以可以被重用。任何可以用来被搜索的类都应该内置一个Weight,而不是在Query类。这个接口定义了6个要被执行的方法:

  1. Weight#getQuery() -- Pointer to the Query that this Weight represents.
  2. Weight#getValue() -- The weight for this Query. For example, the TermQuery.TermWeight value is equal to the idf^2 * boost * queryNorm
  3. Weight#sumOfSquaredWeights() -- The sum of squared weights. Tor TermQuery, this is (idf * boost)^2
  4. Weight#normalize(float) -- Determine the query normalization factor. The query normalization may allow for comparing scores between queries.
  5. Weight#scorer(IndexReader) -- Construct a new Scorer for this Weight. See The Scorer Class below for help defining a Scorer. As the name implies, the Scorer is responsible for doing the actual scoring of documents given the Query.
  6. Weight#explain(IndexReader, int) -- Provide a means for explaining why a given document was scored the way it was.

The Scorer Class<o:p></o:p>

评分类:<o:p></o:p>

Scorer是评分的抽象类,提供一些基本的计分功能供所有的评分类实现,是Lucene评分机制的核心类。Scorer定义了一下的方法,必须被实现。:

  1. Scorer#next() -- Advances to the next document that matches this Query, returning true if and only if there is another document that matches.
  2. Scorer#doc() -- Returns the id of the Document that contains the match. Is not valid until next() has been called at least once.
  3. Scorer#score() -- Return the score of the current document. This value can be determined in any appropriate way for an application. For instance, the TermScorer returns the tf * Weight.getValue() * fieldNorm.
  4. Scorer#skipTo(int) -- Skip ahead in the document matches to the document whose id is greater than or equal to the passed in value. In many instances, skipTo can be implemented more efficiently than simply looping through all the matching documents until the target document is identified.
  5. Scorer#explain(int) -- Provides details on why the score came about.
 
【无人机】基于改进粒子群算法的无人机路径规划研究[和遗传算法、粒子群算法进行比较](Matlab代码实现)内容概要:本文围绕基于改进粒子群算法的无人机路径规划展开研究,重点探讨了在复杂环境中利用改进粒子群算法(PSO)实现无人机三维路径规划的方法,并将其与遗传算法(GA)、标准粒子群算法等传统优化算法进行对比分析。研究内容涵盖路径规划的多目标优化、避障策略、航路点约束以及算法收敛性和寻优能力的评估,所有实验均通过Matlab代码实现,提供了完整的仿真验证流程。文章还提到了多种智能优化算法在无人机路径规划中的应用比较,突出了改进PSO在收敛速度和全局寻优方面的优势。; 适合人群:具备一定Matlab编程基础和优化算法知识的研究生、科研人员及从事无人机路径规划、智能优化算法研究的相关技术人员。; 使用场景及目标:①用于无人机在复杂地形或动态环境下的三维路径规划仿真研究;②比较不同智能优化算法(如PSO、GA、蚁群算法、RRT等)在路径规划中的性能差异;③为多目标优化问题提供算法选型和改进思路。; 阅读建议:建议读者结合文中提供的Matlab代码进行实践操作,重点关注算法的参数设置、适应度函数设计及路径约束处理方式,同时可参考文中提到的多种算法对比思路,拓展到其他智能优化算法的研究与改进中。
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值