【Solr OOM - Fingerprint校验使得大量FieldCacheImpl积压】

问题现象

Solr进程OOM,且Cache页面出现大量FieldCacheImpl积压,每个平均大小在50MB左右。
FieldCacheImpl
The heap dump file has been generated, you can analyze it with heap analyzer tools like Eclipse Memory Analyzer (https://www.eclipse.org/mat/)

十万个为什么

什么场景下类org.apache.solr.uninverting.FieldCacheImpl的对象会大量激增?FieldCacheImpl和IndexFingerprint有啥关系吗?

  1. 跟踪日志和代码,在SolrCore.java中发现大量FieldCacheImpl会被加到perSegmentFingerprintCache中
  2. SolrIndexSearcher.java中getIndexFingerprint方法的注释:gets a cached version of the IndexFingerprint for this searcher
  3. IndexFingerprint.java中getFingerprint方法的注释:Opens a new realtime searcher and returns it’s (possibly cached) fingerprint
  4. Fingerprint官方说明:The IndexFingerprint is cached per index searcher. it is quite useless during high throughput indexing.
  5. 好了,useless,还能说啥。

这里的uninverting是什么?invert又是什么?为什么需要un invert?

Apache Solr uses Lucene to store fields in an inverted index. (倒排索引是为了加快全文检索)This works great for full text search and finding documents quickly. (正排索引是为了加快排序分面)For sorting and faceting, Solr must instead deal with the uninverted value instead of the inverted value. There are two ways that Solr does this:

  • If the field is indexed, Solr can uninvert the field on heap and store it in the fieldCache. This takes query time to generate the first time and heap to store the result.
  • If the field has docValues enabled, Solr stores the uninverted value at index time. This takes index time to save but when used in a query there is no heap usage since the values can be read from disk/OS memory.

什么是DocValues?

Solr官方对Doc Values的介绍:DocValues are a way of recording field values internally that is more efficient for some purposes, such as sorting and faceting, than traditional indexing.
ES官方对Doc Values的介绍:Doc Values就是一种列式存储结构,默认情况下每个字段的 Doc Values 都是激活的,Doc Values 是在索引时创建的,当字段索引时,Elasticsearch 为了能够快速检索,会把字段的值加入倒排索引中,同时它也会存储该字段的Doc Values

日志分析

2019-04-16 21:33:05,708 | INFO | recoveryExecutor-3-thread-17-processing-n:10.110.0.125:21116_solr x:hot_tb_20190216_shard28_replica2 s:shard28 c:hot_tb_20190216 r:core_node77 | Caching fingerprint for searcher:Searcher@2dc0b56c[hot_tb_20190216_shard28_replica2]
main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_1(5.3.1):C6453961) Uninverting(_0(5.3.1):C6454208) Uninverting(_2(5.3.1):C6450256) Uninverting(_3(5.3.1):C6450486) Uninverting(_5(5.3.1):C6457048) Uninverting(_4(5.3.1):C6456367) Uninverting(_6(5.3.1):C6449312) Uninverting(_7(5.3.1):C6451240) Uninverting(_8(5.3.1):C6451535) Uninverting(_9(5.3.1):C6453108) Uninverting(_a(5.3.1):C6452285) Uninverting(_b(5.3.1):c6449094) Uninverting(_c(5.3.1):c6450295) Uninverting(_d(5.3.1):c6451743) Uninverting(_e(5.3.1):c6457054) Uninverting(_f(5.3.1):c6455839) Uninverting(_h(5.3.1):c6454142) Uninverting(_g(5.3.1):c6453632) Uninverting(_i(5.3.1):c6453033) Uninverting(_j(5.3.1):c6452370)))} leafReaderContext:LeafReaderContext(FilterLeafReader(Uninverting(_j(5.3.1):c6452370)) docBase=122604638 ord=19) mavVersion:9223372036854775807 | org.apache.solr.core.SolrCore.getIndexFingerprint(SolrCore.java:1625)
2019-04-16 21:33:05,708 | INFO | recoveryExecutor-3-thread-17-processing-n:10.110.0.125:21116_solr x:hot_tb_20190216_shard28_replica2 s:shard28 c:hot_tb_20190216 r:core_node77 | IndexFingerprint millis:30544.0 result:{maxVersionSpecified=9223372036854775807, maxVersionEncountered=1629434765087080448, maxInHash=1629434765087080448, versionsHash=3122101942921649893, numVersions=129057008, numDocs=129057008, maxDoc=6457054} | org.apache.solr.update.IndexFingerprint.getFingerprint(IndexFingerprint.java:97)
2019-04-16 21:33:05,709 | WARN | recoveryExecutor-3-thread-17-processing-n:10.110.0.125:21116_solr x:hot_tb_20190216_shard28_replica2 s:shard28 c:hot_tb_20190216 r:core_node77 | no frame of reference to tell if we’ve missed updates | org.apache.solr.update.PeerSync.sync(PeerSync.java:243)
2019-04-16 21:33:05,709 | INFO | recoveryExecutor-3-thread-17-processing-n:10.110.0.125:21116_solr x:hot_tb_20190216_shard28_replica2 s:shard28 c:hot_tb_20190216 r:core_node77 | PeerSync Recovery was not successful - trying replication. | org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:402)
2019-04-16 21:33:05,709 | INFO | recoveryExecutor-3-thread-17-processing-n:10.110.0.125:21116_solr x:hot_tb_20190216_shard28_replica2 s:shard28 c:hot_tb_20190216 r:core_node77 | Starting Replication Recovery. | org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:410)
2019-04-16 21:33:05,709 | INFO | recoveryExecutor-3-thread-17-processing-n:10.110.0.125:21116_solr x:hot_tb_20190216_shard28_replica2 s:shard28 c:hot_tb_20190216 r:core_node77 | Attempting to replicate from [https://ip:port/solr/hot_tb_20190216_shard28_replica1/]. | org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:139)
2019-04-16 21:33:05,710 | INFO | recoveryExecutor-3-thread-17-processing-n:10.110.0.125:21116_solr x:hot_tb_20190216_shard28_replica2 s:shard28 c:hot_tb_20190216 r:core_node77 | Setting up SPNego auth with config: /opt/huawei/Bigdata/FusionInsight_HD_V100R002C70SPC200/1_9_SolrServer5/etc/jaas.conf | org.apache.solr.client.solrj.impl.Krb5HttpClientConfigurer.configure(Krb5HttpClientConfigurer.java:77)
2019-04-16 21:33:05,751 | INFO | recoveryExecutor-3-thread-17-processing-n:10.110.0.125:21116_solr x:hot_tb_20190216_shard28_replica2 s:shard28 c:hot_tb_20190216 r:core_node77 | Setting up SPNego auth with config: /opt/huawei/Bigdata/FusionInsight_HD_V100R002C70SPC200/1_9_SolrServer5/etc/jaas.conf | org.apache.solr.client.solrj.impl.Krb5HttpClientConfigurer.configure(Krb5HttpClientConfigurer.java:77)
2019-04-16 21:33:05,768 | INFO | recoveryExecutor-3-thread-17-processing-n:10.110.0.125:21116_solr x:hot_tb_20190216_shard28_replica2 s:shard28 c:hot_tb_20190216 r:core_node77 | Master’s generation: 11 | org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:297)
2019-04-16 21:33:05,768 | INFO | recoveryExecutor-3-thread-17-processing-n:10.110.0.125:21116_solr x:hot_tb_20190216_shard28_replica2 s:shard28 c:hot_tb_20190216 r:core_node77 | Master’s version: 1554149442243 | org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:298)
2019-04-16 21:33:05,768 | INFO | recoveryExecutor-3-thread-17-processing-n:10.110.0.125:21116_solr x:hot_tb_20190216_shard28_replica2 s:shard28 c:hot_tb_20190216 r:core_node77 | Slave’s generation: 11 | org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:317)
2019-04-16 21:33:05,768 | INFO | recoveryExecutor-3-thread-17-processing-n:10.110.0.125:21116_solr x:hot_tb_20190216_shard28_replica2 s:shard28 c:hot_tb_20190216 r:core_node77 | Slave in sync with master. | org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:341)
2019-04-16 21:33:05,768 | INFO | recoveryExecutor-3-thread-17-processing-n:10.110.0.125:21116_solr x:hot_tb_20190216_shard28_replica2 s:shard28 c:hot_tb_20190216 r:core_node77 | No replay needed. | org.apache.solr.cloud.RecoveryStrategy.replay(RecoveryStrategy.java:538)
2019-04-16 21:33:05,768 | INFO | recoveryExecutor-3-thread-17-processing-n:10.110.0.125:21116_solr x:hot_tb_20190216_shard28_replica2 s:shard28 c:hot_tb_20190216 r:core_node77 | Replication Recovery was successful. | org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:429)
2019-04-16 21:33:05,768 | INFO | recoveryExecutor-3-thread-17-processing-n:10.110.0.125:21116_solr x:hot_tb_20190216_shard28_replica2 s:shard28 c:hot_tb_20190216 r:core_node77 | Registering as Active after recovery. | org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:458)
2019-04-16 21:33:05,768 | INFO | recoveryExecutor-3-thread-17-processing-n:10.110.0.125:21116_solr x:hot_tb_20190216_shard28_replica2 s:shard28 c:hot_tb_20190216 r:core_node77 | publishing state=active | org.apache.solr.cloud.ZkController.publish(ZkController.java:1259)
2019-04-16 21:33:05,770 | INFO | recoveryExecutor-3-thread-17-processing-n:10.110.0.125:21116_solr x:hot_tb_20190216_shard28_replica2 s:shard28 c:hot_tb_20190216 r:core_node77 | Updating version bucket highest from index after successful recovery. | org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:526)
2019-04-16 21:33:05,770 | INFO | recoveryExecutor-3-thread-17-processing-n:10.110.0.125:21116_solr x:hot_tb_20190216_shard28_replica2 s:shard28 c:hot_tb_20190216 r:core_node77 | Looking up max value of version field to seed version buckets | org.apache.solr.update.UpdateLog.seedBucketsWithHighestVersion(UpdateLog.java:1609)
2019-04-16 21:33:05,770 | INFO | recoveryExecutor-3-thread-17-processing-n:10.110.0.125:21116_solr x:hot_tb_20190216_shard28_replica2 s:shard28 c:hot_tb_20190216 r:core_node77 | Refreshing highest value of version for 65536 version buckets from index | org.apache.solr.update.VersionInfo.getMaxVersionFromIndex(VersionInfo.java:226)
2019-04-16 21:33:05,773 | INFO | recoveryExecutor-3-thread-17-processing-n:10.110.0.125:21116_solr x:hot_tb_20190216_shard28_replica2 s:shard28 c:hot_tb_20190216 r:core_node77 | Found MAX value 1629434765087080448 from Terms for version in index | org.apache.solr.update.VersionInfo.getMaxVersionFromIndex(VersionInfo.java:236)
2019-04-16 21:33:05,775 | INFO | recoveryExecutor-3-thread-17-processing-n:10.110.0.125:21116_solr x:hot_tb_20190216_shard28_replica2 s:shard28 c:hot_tb_20190216 r:core_node77 | Took 4.0ms to seed version buckets with highest version 1629434765087080448 | org.apache.solr.update.UpdateLog.seedBucketsWithHighestVersion(UpdateLog.java:1602)
2019-04-16 21:33:05,775 | INFO | recoveryExecutor-3-thread-17-processing-n:10.110.0.125:21116_solr x:hot_tb_20190216_shard28_replica2 s:shard28 c:hot_tb_20190216 r:core_node77 | Finished recovery process, successful=[true] | org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:530)

分析日志,确认是走了alreadyInSync函数。
确认是走alreadyInSync流程
确认是走alreadyInSync流程

fingerprint是什么?

IndexFingerprint

PeerSync中sync(),关闭alreadyInSync()有什么影响或者风险?

SOLR-8690:对alreadyInSync()增加了配置开关doFingerprint。
doFingerprint开关
SOLR-8586:提到可以关闭fingerprint校验以避免发生OOM。
关闭fingerprint校验以避免OOM
同时经过对代码的分析,目前的情况是会走到alreadyInSync(),Fingerprint校验耗时超过30多秒,但是最终PeerSync Recovery失败,并尝试进行Replication Recovery,参考这里

2019-04-16 21:33:05,708 | INFO | recoveryExecutor-3-thread-17-processing-n:10.110.0.125:21116_solr x:hot_tb_20190216_shard28_replica2 s:shard28 c:hot_tb_20190216 r:core_node77 | IndexFingerprint millis:30544.0 result:{maxVersionSpecified=9223372036854775807, maxVersionEncountered=1629434765087080448, maxInHash=1629434765087080448, versionsHash=3122101942921649893, numVersions=129057008, numDocs=129057008, maxDoc=6457054} | org.apache.solr.update.IndexFingerprint.getFingerprint(IndexFingerprint.java:97)
2019-04-16 21:33:05,709 | WARN | recoveryExecutor-3-thread-17-processing-n:10.110.0.125:21116_solr x:hot_tb_20190216_shard28_replica2 s:shard28 c:hot_tb_20190216 r:core_node77 | no frame of reference to tell if we’ve missed updates | org.apache.solr.update.PeerSync.sync(PeerSync.java:243)
2019-04-16 21:33:05,709 | INFO | recoveryExecutor-3-thread-17-processing-n:10.110.0.125:21116_solr x:hot_tb_20190216_shard28_replica2 s:shard28 c:hot_tb_20190216 r:core_node77 | PeerSync Recovery was not successful - trying replication. | org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:402)
2019-04-16 21:33:05,709 | INFO | recoveryExecutor-3-thread-17-processing-n:10.110.0.125:21116_solr x:hot_tb_20190216_shard28_replica2 s:shard28 c:hot_tb_20190216 r:core_node77 | Starting Replication Recovery. | org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:410)

规避方案

因此,建议利用doFingerprint开关直接跳过alreadyInSync(),直接进行Replication Recovery,这样做的好处是不针对每个Searcher校验Fingerprint,减少内存占用降低发生OOM的概率,还可能会降低整体Recovery耗时。潜在风险是Recovery均采用Replication Recovery,可能增加Replication Recovery次数。

Reference

https://blog.youkuaiyun.com/weixin_42257250/article/details/83934082
https://issues.apache.org/jira/browse/SOLR-8586
https://issues.apache.org/jira/browse/SOLR-8690
https://issues.apache.org/jira/browse/SOLR-9310
https://issues.apache.org/jira/browse/SOLR-9506
https://issues.apache.org/jira/browse/SOLR-12969
https://risdenk.github.io/2017/12/18/ambari-infra-solr-ranger.html
https://www.elastic.co/guide/cn/elasticsearch/guide/current/docvalues-intro.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值