缘由
hbase在上线一段时间后,发现HMaster经常会自杀挂掉,具体报错如下
报错
2019-06-14 11:24:07,242 WARN [master/ms-fibo-test-dataserver4/172.16.201.239:16000-EventThread] client.ConnectionManager$HConnectionImplementation: This client just lost it's session with ZooKeeper, closing it. It will be recreated next time someone needs it
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:634)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:566)
at org.apache.hadoop.hbase.zookeeper.PendingWatcher.process(PendingWatcher.java:40)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:534)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
2019-06-14 11:24:07,243 INFO [master/ms-fibo-test-dataserver4/172.16.201.239:16000-EventThread] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x26a9d2b8a210323
2019-06-14 11:24:07,243 INFO [master/ms-fibo-test-dataserver4/172.16.201.239:16000-EventThread] zookeeper.ClientCnxn: EventThread shut down
2019-06-14 11:24:07,466 INFO [main-SendThread(ms-fibo-test-dataserver5:2181)] zookeeper.ClientCnxn: Opening socket connection to server ms-fibo-test-dataserver5/172.16.201.240:2181. Will not attempt to authenticate using SASL (unknown error)
2019-06-14 11:24:07,468 INFO [main-SendThread(ms-fibo-test-dataserver5:2181)] zookeeper.ClientCnxn: Socket connection established, initiating session, client: /172.16.201.239:55808, server: ms-fibo-test-dataserver5/172.16.201.240:2181
2019-06-14 11:24:07,479 INFO [main-SendThread(ms-fibo-test-dataserver5:2181)] zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x36a9d2b8896031f has expired, closing socket connection
2019-06-14 11:24:07,479 FATAL [main-EventThread] master.HMaster: master:16000-0x36a9d2b8896031f, quorum=ms-fibo-test-dataserver4.fibodata.com:2181,ms-fibo-test-dataserver5.fibodata.com:2181,ms-fibo-test-dataserver6.fibodata.com:2181, baseZNode=/hbase-unsecure master:16000-0x36a9d2b8896031f received expired from ZooKeeper, aborting
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:634)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:566)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:534)
解决方案
由于网络抖动或者zookeeper full gc导致连接超时,会导致hbase连接不上zookeeper超时而挂掉。
1.调大hbase会话超时时间: hbase-site.xml
<property>
<name>zookeeper.session.timeout</name>
<value>240000</value>
<!--默认: 180000 :zookeeper 会话超时时间,单位是毫秒 -->
</property>
2.调大zookeeper会话超时时间:zoo.cfg
# 默认3000毫米OA
tickTime=5000
Session超时时间限制,如果客户端设置的超时时间不在这个范围,那么会被强制设置为最大或最小时间。默认的Session超时时间是在2 * tickTime ~ 20 * tickTime 这个范围 New in 3.3.0