SolrCloud/ZooKeeper优化

最新推荐文章于 2024-10-19 10:52:28 发布

youchangrui

最新推荐文章于 2024-10-19 10:52:28 发布

阅读量147

点赞数

CC 4.0 BY-SA版权

分类专栏： solr 文章标签：大数据 java ui

本文链接：https://blog.youkuaiyun.com/youchangrui/article/details/84684996

solr 专栏收录该内容

13 篇文章

订阅专栏

本文提供了SolrCloud的优化方案，包括CPU主频选择、ZooKeeper配置优化、Solr参数调整等内容。针对ZooKeeper，文章强调了避免不一致的服务器列表、正确放置事务日志及合理设置Java堆内存的重要性；对于Solr，建议通过调整maxBufferedDocs和mergeFactor参数来优化索引，并介绍了如何合理使用Optimize功能。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

SolrCloud优化:

1:CPU主频

2:ZooKeeper的优化项: 参考:http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html

Things to Avoid

Here are some common problems you can avoid by configuring ZooKeeper correctly:

inconsistent lists of servers

The list of ZooKeeper servers used by the clients must match the list of ZooKeeper servers that each ZooKeeper server has. Things work okay if the client list is a subset of the real list, but things will really act strange if clients have a list of ZooKeeper servers that are in different ZooKeeper clusters. Also, the server lists in each Zookeeper server configuration file should be consistent with one another.

incorrect placement of transasction log

The most performance critical part of ZooKeeper is the transaction log. ZooKeeper syncs transactions to media before it returns a response. A dedicated transaction log device is key to consistent good performance. Putting the log on a busy device will adversely effect performance. If you only have one storage device, put trace files on NFS and increase the snapshotCount; it doesn't eliminate the problem, but it should mitigate it.

incorrect Java heap size

You should take special care to set your Java max heap size correctly. In particular, you should not create a situation in which ZooKeeper swaps to disk. The disk is death to ZooKeeper. Everything is ordered, so if processing one request swaps the disk, all other queued requests will probably do the same. the disk. DON'T SWAP.

Be conservative in your estimates: if you have 4G of RAM, do not set the Java max heap size to 6G or even 4G. For example, it is more likely you would use a 3G heap for a 4G machine, as the operating system and the cache also need memory. The best and only recommend practice for estimating the heap size your system needs is to run load tests, and then make sure you are well below the usage limit that would cause the system to swap.

每指定个maxBufferedDocs 为一个 segment ,每指定个mergeFactor 为一个single index file,适当调整maxBufferedDocs 和 mergeFactor 参数以致优化

4:点击solr admin UI 中的 Optimize 按钮,会将 single index file 合成一个索引文件, Optimize 是一个I/O高密集形任务,且 solr数据频繁的更新也会导致 Optimize 后的索引使用不了多长时间就得重新 Optimize ;

5: 参考:http://www.solr.cc/blog/?p=788

1、数据更新频率：每天数据增量有多大，随时更新还是定时更新
2、数据总量：数据要保存多长时间
3、一致性要求：期望多长时间内看到更新的数据，最长允许多长时间延迟
4、数据特点：数据源包括哪些，平均单条记录大小
5、业务特点：有哪些排序要求，检索条件
6、资源复用：已有的硬件配置是怎样的，是否有升级计划