现象:
新搭建了四台hbase集群,创建了100多张表,导入了3000多万的记录。在使用过程中,每个小时的第36分钟其中一个regionserver节点挂掉,另两个regionserver节点跟着也挂掉。master节点没问题。
先挂掉的regionserver节点的日志节选:
2018-06-06 10:35:50,125 WARN [RpcServer.FifoWFPBQ.default.handler=26,queue=2,port=60040] hfile.LruBlockCache: Trying to cache too large a block cd7871c92e844b01aef319748e9f6c58 @ 537265509 is 33562968 which is larger than 16777216
2018-06-06 10:35:56,641 WARN [RpcServer.FifoWFPBQ.default.handler=26,queue=2,port=60040] hfile.LruBlockCache: Trying to cache too large a block 4a78a84ff6a44b06a3aaee095e8f04c4 @ 705079166 is 33562952 which is larger than 16777216
2018-06-06 10:36:06,228 INFO [main] zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
2018-06-06 10:36:06,229 INFO [main] zookeeper.ZooKeeper: Client environment:host.name=hbase2-159
2018-06-06 10:36:06,229 INFO [main] zookeeper.ZooKeeper: Client environment:java.version=1.8.0_151
2018-06-06 10:36:06,249 INFO [main-SendThread(hbase1:2181)] zookeeper.ClientCnxn: Opening socket connection to server hbase1/172.21.0.17:2181. Will not attempt to authenticate using SASL (unknown error)
2018-06-06 10:36:06,254 INFO [main-SendThread(hbase1:2181)] zookeeper.ClientCnxn: Socket connection established to hbase1/172.21.0.17:2181, initiating session
2018-06-06 10:36:06,260 INFO [main-SendThread(hbase1:2181)] zookeeper.ClientCnxn: Session establishment complete on server hbase1/172