[zz] Lucene goodness

Lucene 2.3 性能提升

最新推荐文章于 2025-12-05 08:00:00 发布

最新推荐文章于 2025-12-05 08:00:00 发布 · 89 阅读

文章标签：

#lucene #performance #Access #UP

Lucene 2.3 版本在内存管理和性能方面进行了重大改进，单线程环境下索引速度从每秒 400 条记录提升到超过 2,100 条。新版本提供了更直观的内存配置方式，并优化了 IndexReader 的重新打开过程。此外，还包含了更快的 StandardTokenizer 和术语向量访问等功能。

Lucene goodness

Lots of good things happening in Lucene land lately, all of which should benefit users with faster indexing and searching capabilities. Most notably, Lucene 2.3 (hopefully released this quarter) has some major changes in indexing memory management and performance. I have personally clocked indexing using release 2.2 at about 400 rec/s (single threaded, Mac Pro dual CPU/dual core, using the contrib/benchmark indexing.alg) to over 2,100 records/s on 2.3-dev (the latest trunk). It also features easier control of the indexing process by specifying how much memory to give it, instead of the confusing maxBufferedDocs factor.

Other work being undertaken should speed up reopening IndexReader’s. There also are a number of smaller changes including a faster StandardTokenizer (the tokenizer most people use) and faster term vector access.

Of course, with that comes more testing and a greater need to make sure the next release is rock solid and backwards compatible. So, if you are a Lucene user, I would encourage you to give trunk a try on some of your non-production indexes, etc. and help us test it out.

link from http://lucene.grantingersoll.com/2007/11/02/lucene-goodness/