lucene 2.4 变化修炼之前言

本文概述了Lucene 2.4版本的主要更新内容,包括API变更、运行时行为变化、向后兼容性政策调整及新特性等。特别介绍了如Hits移除、StandardAnalyzer改进、IndexWriter自动提交策略改变等重要变更。

升级lucene到2.4是最近几天的事情,但是升级发现很多东西改变了,最直接的:

1 hits没有了

2 copy函数变化了

 

我相信还有很多看看changelogs就知道了,所以怀着很大兴趣准备看看源码看看变化.

Changes in backwards compatibility policy   (1)
  1. LUCENE-1340: In a minor change to Lucene's backward compatibility policy, we are now allowing the Fieldable interface to have changes, within reason, and made on a case-by-case basis. If an application implements it's own Fieldable, please be aware of this. Otherwise, no need to be concerned. This is in effect for all 2.X releases, starting with 2.4. Also note, that in all likelihood, Fieldable will be changed in 3.0.
Changes in runtime behavior   (4)
  1. LUCENE-1151: Fix StandardAnalyzer to not mis-identify host names (eg lucene.apache.org) as an ACRONYM. To get back to the pre-2.4 backwards compatible, but buggy, behavior, you can either call StandardAnalyzer.setDefaultReplaceInvalidAcronym(false) (static method), or, set system property org.apache.lucene.analysis.standard.StandardAnalyzer.replaceInvalidAcronym to "false" on JVM startup. All StandardAnalyzer instances created after that will then show the pre-2.4 behavior. Alternatively, you can call setReplaceInvalidAcronym(false) to change the behavior per instance of StandardAnalyzer. This backwards compatibility will be removed in 3.0 (hardwiring the value to true).
    (Mike McCandless)
  2. LUCENE-1044: IndexWriter with autoCommit=true now commits (such that a reader can see the changes) far less often than it used to. Previously, every flush was also a commit. You can always force a commit by calling IndexWriter.commit(). Furthermore, in 3.0, autoCommit will be hardwired to false (IndexWriter constructors that take an autoCommit argument have been deprecated)
    (Mike McCandless)
  3. LUCENE-1335: IndexWriter.addIndexes(Directory[]) and addIndexesNoOptimize no longer allow the same Directory instance to be passed in more than once. Internally, IndexWriter uses Directory and segment name to uniquely identify segments, so adding the same Directory more than once was causing duplicates which led to problems
    (Mike McCandless)
  4. LUCENE-1396: Improve PhraseQuery.toString() so that gaps in the positions are indicated with a ? and multiple terms at the same position are joined with a |.
    (Andrzej Bialecki via Mike McCandless)
API Changes   (26)
  1. LUCENE-1084: Changed all IndexWriter constructors to take an explicit parameter for maximum field size. Deprecated all the pre-existing constructors; these will be removed in release 3.0. NOTE: these new constructors set autoCommit to false.
    (Steven Rowe via Mike McCandless)
  2. LUCENE-584: Changed Filter API to return a DocIdSet instead of a java.util.BitSet. This allows using more efficient data structures for Filters and makes them more flexible. This deprecates Filter.bits(), so all filters that implement this outside the Lucene code base will need to be adapted. See also the javadocs of the Filter class.
    (Paul Elschot, Michael Busch)
  3. LUCENE-1044: Added IndexWriter.commit() which flushes any buffered adds/deletes and then commits a new segments file so readers will see the changes. Deprecate IndexWriter.flush() in favor of IndexWriter.commit().
    (Mike McCandless)
  4. LUCENE-325: Added IndexWriter.expungeDeletes methods, which consult the MergePolicy to find merges necessary to merge away all deletes from the index. This should be a somewhat lower cost operation than optimize.
    (John Wang via Mike McCandless)
  5. LUCENE-1233: Return empty array instead of null when no fields match the specified name in these methods in Document: getFieldables, getFields, getValues, getBinaryValues.
    (Stefan Trcek vai Mike McCandless)
  6. LUCENE-1234: Make BoostingSpanScorer protected.
    (Andi Vajda via Grant Ingersoll)
  7. LUCENE-510: The index now stores strings as true UTF-8 bytes (previously it was Java's modified UTF-8). If any text, either stored fields or a token, has illegal UTF-16 surrogate characters, these characters are now silently replaced with the Unicode replacement character U+FFFD. This is a change to the index file format.
    (Marvin Humphrey via Mike McCandless)
  8. LUCENE-852: Let the SpellChecker caller specify IndexWriter mergeFactor and RAM buffer size.
    (Otis Gospodnetic)
  9. LUCENE-1290: Deprecate org.apache.lucene.search.Hits, Hit and HitIterator and remove all references to these classes from the core. Also update demos and tutorials.
    (Michael Busch)
  10. LUCENE-1288: Add getVersion() and getGeneration() to IndexCommit. getVersion() returns the same value that IndexReader.getVersion() returns when the reader is opened on the same commit.
    (Jason Rutherglen via Mike McCandless)
  11. LUCENE-1311: Added IndexReader.listCommits(Directory) static method to list all commits in a Directory, plus IndexReader.open methods that accept an IndexCommit and open the index as of that commit. These methods are only useful if you implement a custom DeletionPolicy that keeps more than the last commit around.
    (Jason Rutherglen via Mike McCandless)
  12. LUCENE-1325: Added IndexCommit.isOptimized().
    (Shalin Shekhar Mangar via Mike McCandless)
  13. LUCENE-1324: Added TokenFilter.reset().
    (Shai Erera via Mike McCandless)
  14. LUCENE-1340: Added Fieldable.omitTf() method to skip indexing term frequency, positions and payloads. This saves index space, and indexing/searching time.
    (Eks Dev via Mike McCandless)
  15. LUCENE-1219: Add basic reuse API to Fieldable for binary fields: getBinaryValue/Offset/Length(); currently only lazy fields reuse the provided byte[] result to getBinaryValue.
    (Eks Dev via Mike McCandless)
  16. LUCENE-1334: Add new constructor for Term: Term(String fieldName) which defaults term text to "".
    (DM Smith via Mike McCandless)
  17. LUCENE-1333: Added Token.reinit(*) APIs to re-initialize (reuse) a Token. Also added term() method to return a String, with a performance penalty clearly documented. Also implemented hashCode() and equals() in Token, and fixed all core and contrib analyzers to use the re-use APIs.
    (DM Smith via Mike McCandless)
  18. LUCENE-1329: Add optional readOnly boolean when opening an IndexReader. A readOnly reader is not allowed to make changes (deletions, norms) to the index; in exchanged, the isDeleted method, often a bottleneck when searching with many threads, is not synchronized. The default for readOnly is still false, but in 3.0 the default will become true.
    (Jason Rutherglen via Mike McCandless)
  19. LUCENE-1367: Add IndexCommit.isDeleted().
    (Shalin Shekhar Mangar via Mike McCandless)
  20. LUCENE-1061: Factored out all "new XXXQuery(...)" in QueryParser.java into protected methods newXXXQuery(...) so that subclasses can create their own subclasses of each Query type.
    (John Wang via Mike McCandless)
  21. LUCENE-753: Added new Directory implementation org.apache.lucene.store.NIOFSDirectory, which uses java.nio's FileChannel to do file reads. On most non-Windows platforms, with many threads sharing a single searcher, this may yield sizable improvement to query throughput when compared to FSDirectory, which only allows a single thread to read from an open file at a time.
    (Jason Rutherglen via Mike McCandless)
  22. LUCENE-1371: Added convenience method TopDocs Searcher.search(Query query, int n).
    (Mike McCandless)
  23. LUCENE-1356: Allow easy extensions of TopDocCollector by turning constructor and fields from package to protected.
    (Shai Erera via Doron Cohen)
  24. LUCENE-1375: Added convencience method IndexCommit.getTimestamp, which is equivalent to getDirectory().fileModified(getSegmentsFileName()).
    (Mike McCandless)
  25. LUCENE-1366: Rename Field.Index options to be more accurate: TOKENIZED becomes ANALYZED; UN_TOKENIZED becomes NOT_ANALYZED; NO_NORMS becomes NOT_ANALYZED_NO_NORMS and a new ANALYZED_NO_NORMS is added.
    (Mike McCandless)
  26. LUCENE-1131: Added numDeletedDocs method to IndexReader
    (Otis Gospodnetic)
Bug fixes   (16)
  1. LUCENE-1134: Fixed BooleanQuery.rewrite to only optimize a single clause query if minNumShouldMatch<=0.
    (Shai Erera via Michael Busch)
  2. LUCENE-1169: Fixed bug in IndexSearcher.search(): searching with a filter might miss some hits because scorer.skipTo() is called without checking if the scorer is already at the right position. scorer.skipTo(scorer.doc()) is not a NOOP, it behaves as scorer.next().
    (Eks Dev, Michael Busch)
  3. LUCENE-1182: Added scorePayload to SimilarityDelegator
    (Andi Vajda via Grant Ingersoll)
  4. LUCENE-1213: MultiFieldQueryParser was ignoring slop in case of a single field phrase.
    (Trejkaz via Doron Cohen)
  5. LUCENE-1228: IndexWriter.commit() was not updating the index version and as result IndexReader.reopen() failed to sense index changes.
    (Doron Cohen)
  6. LUCENE-1267: Added numDocs() and maxDoc() to IndexWriter; deprecated docCount().
    (Mike McCandless)
  7. LUCENE-1274: Added new prepareCommit() method to IndexWriter, which does phase 1 of a 2-phase commit (commit() does phase 2). This is needed when you want to update an index as part of a transaction involving external resources (eg a database). Also deprecated abort(), renaming it to rollback().
    (Mike McCandless)
  8. LUCENE-1003: Stop RussianAnalyzer from removing numbers.
    (TUSUR OpenTeam, Dmitry Lihachev via Otis Gospodnetic)
  9. LUCENE-1152: SpellChecker fix around clearIndex and indexDictionary methods, plus removal of IndexReader reference.
    (Naveen Belkale via Otis Gospodnetic)
  10. LUCENE-1046: Removed dead code in SpellChecker
    (Daniel Naber via Otis Gospodnetic)
  11. LUCENE-1189: Fixed the QueryParser to handle escaped characters within quoted terms correctly.
    (Tomer Gabel via Michael Busch)
  12. LUCENE-1299: Fixed NPE in SpellChecker when IndexReader is not null and field is
    (Grant Ingersoll)
  13. LUCENE-1303: Fixed BoostingTermQuery's explanation to be marked as a Match depending only upon the non-payload score part, regardless of the effect of the payload on the score. Prior to this, score of a query containing a BTQ differed from its explanation.
    (Doron Cohen)
  14. LUCENE-1310: Fixed SloppyPhraseScorer to work also for terms repeating more than twice in the query.
    (Doron Cohen)
  15. LUCENE-1351: ISOLatin1AccentFilter now cleans additional ligatures
    (Cedrik Lime via Grant Ingersoll)
  16. LUCENE-1383: Workaround a nasty "leak" in Java's builtin ThreadLocal, to prevent Lucene from causing unexpected OutOfMemoryError in certain situations (notably J2EE applications).
    (Chris Lu via Mike McCandless)
New features   (20)
  1. LUCENE-1137: Added Token.set/getFlags() accessors for passing more information about a Token through the analysis process. The flag is not indexed/stored and is thus only used by analysis.
  2. LUCENE-1147: Add -segment option to CheckIndex tool so you can check only a specific segment or segments in your index.
    (Mike McCandless)
  3. LUCENE-1045: Reopened this issue to add support for short and bytes.
  4. LUCENE-584: Added new data structures to o.a.l.util, such as OpenBitSet and SortedVIntList. These extend DocIdSet and can directly be used for Filters with the new Filter API. Also changed the core Filters to use OpenBitSet instead of java.util.BitSet.
    (Paul Elschot, Michael Busch)
  5. LUCENE-494: Added QueryAutoStopWordAnalyzer to allow for the automatic removal, from a query of frequently occurring terms. This Analyzer is not intended for use during indexing.
    (Mark Harwood via Grant Ingersoll)
  6. LUCENE-1044: Change Lucene to properly "sync" files after committing, to ensure on a machine or OS crash or power cut, even with cached writes, the index remains consistent. Also added explicit commit() method to IndexWriter to force a commit without having to close.
    (Mike McCandless)
  7. LUCENE-997: Add search timeout (partial) support. A TimeLimitedCollector was added to allow limiting search time. It is a partial solution since timeout is checked only when collecting a hit, and therefore a search for rare words in a huge index might not stop within the specified time.
    (Sean Timm via Doron Cohen)
  8. LUCENE-1184: Allow SnapshotDeletionPolicy to be re-used across close/re-open of IndexWriter while still protecting an open snapshot
    (Tim Brennan via Mike McCandless)
  9. LUCENE-1194: Added IndexWriter.deleteDocuments(Query) to delete documents matching the specified query. Also added static unlock and isLocked methods (deprecating the ones in IndexReader).
    (Mike McCandless)
  10. LUCENE-1201: Add IndexReader.getIndexCommit() method.
    (Tim Brennan via Mike McCandless)
  11. LUCENE-550: Added InstantiatedIndex implementation. Experimental Index store similar to MemoryIndex but allows for multiple documents in memory.
    (Karl Wettin via Grant Ingersoll)
  12. LUCENE-400: Added word based n-gram filter (in contrib/analyzers) called ShingleFilter and an Analyzer wrapper that wraps another Analyzer's token stream with a ShingleFilter
    (Sebastian Kirsch, Steve Rowe via Grant Ingersoll)
  13. LUCENE-1166: Decomposition tokenfilter for languages like German and Swedish
    (Thomas Peuss via Grant Ingersoll)
  14. LUCENE-1187: ChainedFilter and BooleanFilter now work with new Filter API and DocIdSetIterator-based filters. Backwards-compatibility with old BitSet-based filters is ensured.
    (Paul Elschot via Michael Busch)
  15. LUCENE-1295: Added new method to MoreLikeThis for retrieving interesting terms and made retrieveTerms(int) public.
    (Grant Ingersoll)
  16. LUCENE-1298: MoreLikeThis can now accept a custom Similarity
    (Grant Ingersoll)
  17. LUCENE-1297: Allow other string distance measures for the SpellChecker
    (Thomas Morton via Otis Gospodnetic)
  18. LUCENE-1001: Provide access to Payloads via Spans. All existing Span Query implementations in Lucene implement.
    (Mark Miller, Grant Ingersoll)
  19. LUCENE-1354: Provide programmatic access to CheckIndex
    (Grant Ingersoll, Mike McCandless)
  20. LUCENE-1279: Add support for Collators to RangeFilter/Query and Query Parser.
    (Steve Rowe via Grant Ingersoll)
Optimizations   (6)
  1. LUCENE-705: When building a compound file, use RandomAccessFile.setLength() to tell the OS/filesystem to pre-allocate space for the file. This may improve fragmentation in how the CFS file is stored, and allows us to detect an upcoming disk full situation before actually filling up the disk.
    (Mike McCandless)
  2. LUCENE-1120: Speed up merging of term vectors by bulk-copying the raw bytes for each contiguous range of non-deleted documents.
    (Mike McCandless)
  3. LUCENE-1185: Avoid checking if the TermBuffer 'scratch' in SegmentTermEnum is null for every call of scanTo().
    (Christian Kohlschuetter via Michael Busch)
  4. LUCENE-1217: Internal to Field.java, use isBinary instead of runtime type checking for possible speedup of binaryValue().
    (Eks Dev via Mike McCandless)
  5. LUCENE-1183: Optimized TRStringDistance class (in contrib/spell) that uses less memory than the previous version.
    (Cédrik LIME via Otis Gospodnetic)
  6. LUCENE-1195: Improve term lookup performance by adding a LRU cache to the TermInfosReader. In performance experiments the speedup was about 25% on average on mid-size indexes with ~500,000 documents for queries with 3 terms and about 7% on larger indexes with ~4.3M documents.
    (Michael Busch)
内容概要:本文是一篇关于使用RandLANet模型对SensatUrban数据集进行点云语义分割的实战教程,系统介绍了从环境搭建、数据准备、模型训练与测试到精度评估的完整流程。文章详细说明了在Ubuntu系统下配置TensorFlow 2.2、CUDA及cuDNN等深度学习环境的方法,并指导用户下载和预处理SensatUrban数据集。随后,逐步讲解RandLANet代码的获取与运行方式,包括训练、测试命令的执行与参数含义,以及如何监控训练过程中的关键指标。最后,教程涵盖测试结果分析、向官方平台提交结果、解读评估报告及可视化效果等内容,并针对常见问题提供解决方案。; 适合人群:具备一定深度学习基础,熟悉Python编程和深度学习框架,从事计算机视觉或三维点云相关研究的学生、研究人员及工程师;适合希望动手实践点云语义分割项目的初学者与进阶者。; 使用场景及目标:①掌握RandLANet网络结构及其在点云语义分割任务中的应用;②学会完整部署一个点云分割项目,包括数据处理、模型训练、测试与性能评估;③为参与相关竞赛或科研项目提供技术支撑。; 阅读建议:建议读者结合提供的代码链接和密码访问完整资料,在本地或云端环境中边操作边学习,重点关注数据格式要求与训练参数设置,遇到问题时参考“常见问题与解决技巧”部分及时排查。
内容概要:本文详细介绍了三相异步电机SVPWM-DTC(空间矢量脉宽调制-直接转矩控制)的Simulink仿真实现方法,结合DTC响应快与SVPWM谐波小的优点,构建高性能电机控制系统。文章系统阐述了控制原理,包括定子磁链观测、转矩与磁链误差滞环比较、扇区判断及电压矢量选择,并通过SVPWM技术生成固定频率PWM信号,提升系统稳态性能。同时提供了完整的Simulink建模流程,涵盖电机本体、磁链观测器、误差比较、矢量选择、SVPWM调制、逆变器驱动等模块的搭建与参数设置,给出了仿真调试要点与预期结果,如电流正弦性、转矩响应快、磁链轨迹趋圆等,并提出了模型优化与扩展方向,如改进观测器、自适应滞环、弱磁控制和转速闭环等。; 适合人群:电气工程、自动化及相关专业本科生、研究生,从事电机控制算法开发的工程师,具备一定MATLAB/Simulink和电机控制理论基础的技术人员。; 使用场景及目标:①掌握SVPWM-DTC控制策略的核心原理与实现方式;②在Simulink中独立完成三相异步电机高性能控制系统的建模与仿真;③通过仿真验证控制算法有效性,为实际工程应用提供设计依据。; 阅读建议:学习过程中应结合文中提供的电机参数和模块配置逐步搭建模型,重点关注磁链观测、矢量选择表和SVPWM调制的实现细节,仿真时注意滞环宽度与开关频率的调试,建议配合MATLAB官方工具箱文档进行参数校准与结果分析。
已经博主授权,源码转载自 https://pan.quark.cn/s/bf1e0d5b9490 本文重点阐述了Vue2.0多Tab切换组件的封装实践,详细说明了通过封装Tab切换组件达成多Tab切换功能,从而满足日常应用需求。 知识点1:Vue2.0多Tab切换组件的封装* 借助封装Tab切换组件,达成多Tab切换功能* 支持tab切换、tab定位、tab自动化仿React多Tab实现知识点2:TabItems组件的应用* 在index.vue文件中应用TabItems组件,借助name属性设定tab的标题* 通过:isContTab属性来设定tab的内容* 能够采用子组件作为tab的内容知识点3:TabItems组件的样式* 借助index.less文件来设定TabItems组件的样式* 设定tab的标题样式、背景色彩、边框样式等* 使用animation达成tab的切换动画知识点4:Vue2.0多Tab切换组件的构建* 借助运用Vue2.0框架,达成多Tab切换组件的封装* 使用Vue2.0的组件化理念,达成TabItems组件的封装* 通过运用Vue2.0的指令和绑定机制,达成tab的切换功能知识点5:Vue2.0多Tab切换组件的优势* 达成多Tab切换功能,满足日常应用需求* 支持tab切换、tab定位、tab自动化仿React多Tab实现* 能够满足多样的业务需求,具备良好的扩展性知识点6:Vue2.0多Tab切换组件的应用场景* 能够应用于多样的业务场景,例如:管理系统、电商平台、社交媒体等* 能够满足不同的业务需求,例如:多Tab切换、数据展示、交互式操作等* 能够与其它Vue2.0组件结合运用,达成复杂的业务逻辑Vue2.0多Tab切换组件的封装实例提供了...
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值