HBase 填坑之RegionServers异常退出2

RegionServers又崩溃了,真是让人头疼。

1.日志:

2019-11-20 03:47:34,174 INFO [sync.3] wal.FSHLog: Slow sync cost: 464 ms, current pipeline: [DatanodeInfoWithStorage[125.94.213.41:50010,DS-cfd2851f-a298-4976-b0e9-f0546a472cb0,DISK], DatanodeInfoWithStorage[125.94.213.5:50010,DS-795d6f28-78f1-4e11-b0d6-7e87654a3306,DISK], DatanodeInfoWithStorage[125.94.213.48:50010,DS-5796c1b4-95a9-4fac-b588-d3166c44fe0d,DISK]]
2019-11-20 03:47:35,428 INFO [sync.0] wal.FSHLog: Slow sync cost: 210 ms, current pipeline: [DatanodeInfoWithStorage[125.94.213.41:50010,DS-cfd2851f-a298-4976-b0e9-f0546a472cb0,DISK], DatanodeInfoWithStorage[125.94.213.5:50010,DS-795d6f28-78f1-4e11-b0d6-7e87654a3306,DISK], DatanodeInfoWithStorage[125.94.213.48:50010,DS-5796c1b4-95a9-4fac-b588-d3166c44fe0d,DISK]]
2019-11-20 03:49:39,055 INFO [hdpv-014,16020,1574074564013_ChoreService_4] regionserver.HRegionServer$CompactionChecker: Chore: CompactionChecker missed its start time
2019-11-20 03:49:39,055 INFO [hdpv-014,16020,1574074564013_ChoreService_4] regionserver.HRegionServer$PeriodicMemstoreFlusher: Chore: hdpv-014,16020,1574074564013-MemstoreFlusherChore missed its start time
2019-11-20 03:49:39,055 INFO [hdpv-014,16020,1574074564013_ChoreService_4] regionserver.HeapMemoryManager$HeapMemoryTunerChore: Chore: hdpv-014,16020,1574074564013-HeapMemoryTunerChore missed its start time
2019-11-20 03:49:39,268 WARN [regionserver/hdpv-014/125.94.213.41:16020] util.Sleeper: We slept 124370ms instead of 3000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2019-11-20 03:49:39,269 WARN [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 121995ms
GC pool 'ParNew' had collection(s): count=1 time=120630ms
2019-11-20 03:49:39,423 INFO [RS_OPEN_REGION-hdpv-014:16020-0-SendThread(hdpv-007:2181)] zookeeper.ClientCnxn: Client session timed out, have not heard from server in 157651ms for sessionid 0x16e7ce845dc02b8, closing socket connection and attempting reconnect
2019-11-20 03:49:39,423 INFO [hdpv-014,16020,1574074564013_ChoreService_4] regionserver.HRegionServer$MovedRegionsCleaner: Chore: MovedRegionsCleaner for region hdpv-014,16020,1574074564013 missed its start time
2019-11-20 03:49:39,423 INFO [regionserver/hdpv-014/125.94.213.41:16020-SendThread(hdpv-001:2181)] zookeeper.ClientCnxn: Client session timed out, have not heard from server in 157651ms for sessionid 0x26e7ce845ed0284, closing socket connection and attempting reconnect
2019-11-20 03:49:39,990 WARN [DataStreamer for file /apps/hbase/data/data/default/MIRROR_YY_ACCOUNT_GAME/c9bf3530466b67c29162e1484a18ba7d/.tmp/49c75f29a55740769ede127a4f3c986f block BP-1202337336-125.94.213.13-1419656350533:blk_1526126839_452537137] hdfs.DFSClient: DataStreamer Exception
java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at org.apache.hadoop.hdfs.DFSPacket.writeTo(DFSPacket.java:176)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:611)
2019-11-20 03:49:39,990 WARN [DataStreamer for file /apps/hbase/data/WALs/hdpv-014,16020,1574074564013/hdpv-014%2C16020%2C1574074564013.default.1574192168896 block BP-1202337336-125.94.213.13-1419656350533:blk_1526124074_452534361] hdfs.DFSClient: DataStreamer Exception

…………

2019-11-20 03:49:49,167 ERROR [sync.2] wal.FSHLog: Error syncing, request close of WAL
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /apps/hbase/data/oldWALs/hdpv-014%2C16020%2C1574074564013.default.1574192168896 (inode 595750975): File is not open for writing. Holder DFSClient_NONMAPREDUCE_-2054740082_1 does not have any open files.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3674)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:3574)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:883)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:526)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2345)

at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554)
at org.apache.hadoop.ipc.Client.call(Client.java:1498)
at org.apache.hadoop.ipc.Client.call(Client.java:1398)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
at com.sun.proxy.$Proxy16.getAdditionalDatanode(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolTranslatorPB.java:484)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185)
at com.sun.proxy.$Proxy17.getAdditionalDatanode(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:283)
at com.sun.proxy.$Proxy18.getAdditionalDatanode(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1102)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1268)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:993)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:500)

…………

2019-11-20 03:50:21,676 ERROR [RS_CLOSE_REGION-hdpv-014:16020-1] regionserver.HRegion: Memstore size is 76160064
2019-11-20 03:50:21,745 INFO [StoreCloserThread-YD_ONLINE_GUID,\x19,1574079589337.1645967ddd012b9a875e863266751f58.-1] regionserver.HStore: Closed USER
2019-11-20 03:50:21,785 INFO [RS_CLOSE_REGION-hdpv-014:16020-1] parallel.BaseTaskRunner: Shutting down task runner because Indexer is being stopped
2019-11-20 03:50:21,785 INFO [RS_CLOSE_REGION-hdpv-014:16020-2] parallel.BaseTaskRunner: Shutting down task runner because Indexer is being stopped
2019-11-20 03:50:21,847 INFO [RS_CLOSE_REGION-hdpv-014:16020-2] write.ParallelWriterIndexCommitter: Shutting down ParallelWriterIndexCommitter because Indexer is being stopped
2019-11-20 03:50:21,847 INFO [RS_CLOSE_REGION-hdpv-014:16020-2] parallel.BaseTaskRunner: Shutting down task runner because Indexer is being stopped
2019-11-20 03:50:21,847 INFO [RS_CLOSE_REGION-hdpv-014:16020-1] write.ParallelWriterIndexCommitter: Shutting down ParallelWriterIndexCommitter because Indexer is being stopped
2019-11-20 03:50:21,847 INFO [RS_CLOSE_REGION-hdpv-014:16020-1] parallel.BaseTaskRunner: Shutting down task runner because Indexer is being stopped
2019-11-20 03:50:21,866 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] regionserver.HStore: Closed BELONG
2019-11-20 03:50:21,905 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] regionserver.HStore: Closed BIND_PHONE
2019-11-20 03:50:21,906 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] regionserver.HStore: Closed GAME_LABEL
2019-11-20 03:50:21,907 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] regionserver.HStore: Closed LOGIN
2019-11-20 03:50:21,907 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] regionserver.HStore: Closed LOST
2019-11-20 03:50:21,908 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] regionserver.HStore: Closed PAYMENT
2019-11-20 03:50:21,973 INFO [RS_CLOSE_REGION-hdpv-014:16020-2] recovery.TrackingParallelWriterIndexCommitter: Shutting down TrackingParallelWriterIndexCommitter
2019-11-20 03:50:21,973 INFO [RS_CLOSE_REGION-hdpv-014:16020-1] recovery.TrackingParallelWriterIndexCommitter: Shutting down TrackingParallelWriterIndexCommitter
2019-11-20 03:50:21,973 INFO [RS_CLOSE_REGION-hdpv-014:16020-2] parallel.BaseTaskRunner: Shutting down task runner because Indexer is being stopped
2019-11-20 03:50:21,973 INFO [RS_CLOSE_REGION-hdpv-014:16020-1] parallel.BaseTaskRunner: Shutting down task runner because Indexer is being stopped
2019-11-20 03:50:21,976 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] regionserver.HStore: Closed PERIOD_GAME_PAYMENT
2019-11-20 03:50:21,977 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] regionserver.HStore: Closed PERIOD_LOGIN
2019-11-20 03:50:21,978 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] regionserver.HStore: Closed PERIOD_PAYMENT_AVERAGE
2019-11-20 03:50:21,979 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] regionserver.HStore: Closed PERIOD_PAYMENT_TOTAL
2019-11-20 03:50:21,980 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] regionserver.HStore: Closed PLATFORM
2019-11-20 03:50:21,980 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] regionserver.HStore: Closed REFERER
2019-11-20 03:50:21,981 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] regionserver.HStore: Closed REGISTER
2019-11-20 03:50:21,981 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] regionserver.HStore: Closed ROLE_LABEL
2019-11-20 03:50:21,982 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] regionserver.HStore: Closed ROW_UPDATE_TIME
2019-11-20 03:50:21,982 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] regionserver.HStore: Closed SOUND_LABEL
2019-11-20 03:50:21,989 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] regionserver.HStore: Closed STYLE_LABEL
2019-11-20 03:50:21,991 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] regionserver.HStore: Closed SUBJECT_LABEL
2019-11-20 03:50:22,118 INFO [RS_CLOSE_REGION-hdpv-014:16020-2] regionserver.HRegion: Closed YD_ONLINE_GUID,\x19,1574079589337.1645967ddd012b9a875e863266751f58.
2019-11-20 03:50:22,128 INFO [RS_CLOSE_REGION-hdpv-014:16020-1] regionserver.HRegion: Closed MIRROR_SQW_ACCOUNT,\x09,1573025684248.4707732dccb430927c79c82eae116dd1.
2019-11-20 03:50:22,128 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] regionserver.HStore: Closed BELONG
2019-11-20 03:50:22,148 INFO [StoreCloserThread-YD_ONLINE_GUID,\x14,1574079589337.c6b2933c34bdbcfd106f72aadf62a3d6.-1] regionserver.HStore: Closed DEVICE
2019-11-20 03:50:22,165 INFO [StoreCloserThread-YD_ONLINE_GUID,\x14,1574079589337.c6b2933c34bdbcfd106f72aadf62a3d6.-1] regionserver.HStore: Closed GAME
2019-11-20 03:50:22,210 INFO [StoreCloserThread-YD_ONLINE_GUID,\x14,1574079589337.c6b2933c34bdbcfd106f72aadf62a3d6.-1] regionserver.HStore: Closed TIME
2019-11-20 03:50:22,374 INFO [StoreCloserThread-YD_ONLINE_GUID,\x14,1574079589337.c6b2933c34bdbcfd106f72aadf62a3d6.-1] regionserver.HStore: Closed USER
2019-11-20 03:50:22,375 INFO [RS_CLOSE_REGION-hdpv-014:16020-1] parallel.BaseTaskRunner: Shutting down task runner because Indexer is being stopped
2019-11-20 03:50:22,375 INFO [RS_CLOSE_REGION-hdpv-014:16020-1] write.ParallelWriterIndexCommitter: Shutting down ParallelWriterIndexCommitter because Indexer is being stopped
2019-11-20 03:50:22,375 INFO [RS_CLOSE_REGION-hdpv-014:16020-1] parallel.BaseTaskRunner: Shutting down task runner because Indexer is being stopped
2019-11-20 03:50:22,375 INFO [RS_CLOSE_REGION-hdpv-014:16020-1] recovery.TrackingParallelWriterIndexCommitter: Shutting down TrackingParallelWriterIndexCommitter
2019-11-20 03:50:22,375 INFO [RS_CLOSE_REGION-hdpv-014:16020-1] parallel.BaseTaskRunner: Shutting down task runner because Indexer is being stopped
2019-11-20 03:50:22,375 INFO [RS_CLOSE_REGION-hdpv-014:16020-1] regionserver.HRegion: Closed YD_ONLINE_GUID,\x14,1574079589337.c6b2933c34bdbcfd106f72aadf62a3d6.
2019-11-20 03:50:22,376 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x04,1571122122524.0a3100486babb7562610ae1d9990a94a.-1] regionserver.HStore: Closed BELONG
2019-11-20 03:50:22,421 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] regionserver.HStore: Closed BIND_PHONE
2019-11-20 03:50:22,421 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x04,1571122122524.0a3100486babb7562610ae1d9990a94a.-1] regionserver.HStore: Closed BIND_PHONE
2019-11-20 03:50:22,422 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] regionserver.HStore: Closed GAME_LABEL
2019-11-20 03:50:22,422 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x04,1571122122524.0a3100486babb7562610ae1d9990a94a.-1] regionserver.HStore: Closed GAME_LABEL
2019-11-20 03:50:22,423 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] regionserver.HStore: Closed LOGIN
2019-11-20 03:50:22,423 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x04,1571122122524.0a3100486babb7562610ae1d9990a94a.-1] regionserver.HStore: Closed LOGIN
2019-11-20 03:50:22,423 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] regionserver.HStore: Closed LOST
2019-11-20 03:50:22,423 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x04,1571122122524.0a3100486babb7562610ae1d9990a94a.-1] regionserver.HStore: Closed LOST
2019-11-20 03:50:22,425 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] regionserver.HStore: Closed PAYMENT
2019-11-20 03:50:22,425 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x04,1571122122524.0a3100486babb7562610ae1d9990a94a.-1] regionserver.HStore: Closed PAYMENT
2019-11-20 03:50:22,585 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] regionserver.HStore: Closed PERIOD_GAME_PAYMENT
2019-11-20 03:50:22,586 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] regionserver.HStore: Closed PERIOD_LOGIN
2019-11-20 03:50:22,587 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] regionserver.HStore: Closed PERIOD_PAYMENT_AVERAGE
2019-11-20 03:50:22,588 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] regionserver.HStore: Closed PERIOD_PAYMENT_TOTAL
2019-11-20 03:50:22,588 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] regionserver.HStore: Closed PLATFORM
2019-11-20 03:50:22,588 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] regionserver.HStore: Closed REFERER
2019-11-20 03:50:22,589 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] regionserver.HStore: Closed REGISTER
2019-11-20 03:50:22,589 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] regionserver.HStore: Closed ROLE_LABEL
2019-11-20 03:50:22,590 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] regionserver.HStore: Closed ROW_UPDATE_TIME
2019-11-20 03:50:22,590 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] regionserver.HStore: Closed SOUND_LABEL
2019-11-20 03:50:22,591 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] regionserver.HStore: Closed STYLE_LABEL
2019-11-20 03:50:22,591 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] regionserver.HStore: Closed SUBJECT_LABEL
2019-11-20 03:50:22,592 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] regionserver.HStore: Closed VIP

…………

2019-11-20 03:50:37,468 WARN [regionserver/hdpv-014/125.94.213.41:16020] zookeeper.ZKUtil: regionserver:16020-0x26e7ce845ed0283, quorum=hdpv-001:2181,hdpv-003:2181,hdpv-005:2181,hdpv-007:2181,hdpv-009:2181, baseZNode=/hbase-unsecure Unable to list children of znode /hbase-unsecure/replication/rs/hdpv-014,16020,1574074564013
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-unsecure/replication/rs/hdpv-014,16020,1574074564013
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getChildren(RecoverableZooKeeper.java:292)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:455)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchThem(ZKUtil.java:483)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenBFSAndWatchThem(ZKUtil.java:1462)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNodeRecursivelyMultiOrSequential(ZKUtil.java:1384)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNodeRecursively(ZKUtil.java:1266)
at org.apache.hadoop.hbase.replication.ReplicationQueuesZKImpl.removeAllQueues(ReplicationQueuesZKImpl.java:196)
at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.join(ReplicationSourceManager.java:302)
at org.apache.hadoop.hbase.replication.regionserver.Replication.join(Replication.java:202)
at org.apache.hadoop.hbase.replication.regionserver.Replication.stopReplicationService(Replication.java:194)
at org.apache.hadoop.hbase.regionserver.HRegionServer.stopServiceThreads(HRegionServer.java:2269)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1118)
at java.lang.Thread.run(Thread.java:745)
2019-11-20 03:50:37,739 ERROR [regionserver/hdpv-014/125.94.213.41:16020] zookeeper.ZooKeeperWatcher: regionserver:16020-0x26e7ce845ed0283, quorum=hdpv-001:2181,hdpv-003:2181,hdpv-005:2181,hdpv-007:2181,hdpv-009:2181, baseZNode=/hbase-unsecure Received unexpected KeeperException, re-throwing exception
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-unsecure/replication/rs/hdpv-014,16020,1574074564013
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getChildren(RecoverableZooKeeper.java:292)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:455)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchThem(ZKUtil.java:483)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenBFSAndWatchThem(ZKUtil.java:1462)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNodeRecursivelyMultiOrSequential(ZKUtil.java:1384)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNodeRecursively(ZKUtil.java:1266)
at org.apache.hadoop.hbase.replication.ReplicationQueuesZKImpl.removeAllQueues(ReplicationQueuesZKImpl.java:196)
at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.join(ReplicationSourceManager.java:302)
at org.apache.hadoop.hbase.replication.regionserver.Replication.join(Replication.java:202)
at org.apache.hadoop.hbase.replication.regionserver.Replication.stopReplicationService(Replication.java:194)
at org.apache.hadoop.hbase.regionserver.HRegionServer.stopServiceThreads(HRegionServer.java:2269)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1118)
at java.lang.Thread.run(Thread.java:745)
2019-11-20 03:50:37,840 INFO [regionserver/hdpv-014/125.94.213.41:16020] ipc.RpcServer: Stopping server on 16020
2019-11-20 03:50:37,840 INFO [RpcServer.listener,port=16020] ipc.RpcServer: RpcServer.listener,port=16020: stopping
2019-11-20 03:50:37,984 INFO [RpcServer.responder] ipc.RpcServer: RpcServer.responder: stopped
2019-11-20 03:50:37,984 INFO [RpcServer.responder] ipc.RpcServer: RpcServer.responder: stopping
2019-11-20 03:50:38,901 WARN [regionserver/hdpv-014/125.94.213.41:16020] regionserver.HRegionServer: Failed deleting my ephemeral node
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-unsecure/rs/hdpv-014,16020,1574074564013
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:178)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1222)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1211)
at org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1528)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1126)
at java.lang.Thread.run(Thread.java:745)
2019-11-20 03:50:39,198 INFO [regionserver/hdpv-014/125.94.213.41:16020] regionserver.HRegionServer: stopping server hdpv-014,16020,1574074564013; zookeeper connection closed.
2019-11-20 03:50:39,198 INFO [regionserver/hdpv-014/125.94.213.41:16020] regionserver.HRegionServer: regionserver/hdpv-014/125.94.213.41:16020 exiting
2019-11-20 03:50:40,717 ERROR [main] regionserver.HRegionServerCommandLine: Region server exiting
java.lang.RuntimeException: HRegionServer Aborted
at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:68)
at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:87)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:2801)
2019-11-20 03:50:42,062 INFO [pool-4-thread-1] regionserver.ShutdownHook: Shutdown hook starting; hbase.shutdown.hook=true; fsShutdownHook=org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@344344fa
2019-11-20 03:50:42,062 INFO [pool-4-thread-1] regionserver.ShutdownHook: Starting fs shutdown hook thread.
2019-11-20 03:50:42,063 ERROR [Thread-9022] hdfs.DFSClient: Failed to close inode 595756106
org.apache.hadoop.ipc.RemoteException(java.io.IOException): BP-1202337336-125.94.213.13-1419656350533:blk_1526126839_452537137 does not exist or is not under Constructionnull
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkUCBlock(FSNamesystem.java:6683)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.updateBlockForPipeline(FSNamesystem.java:6751)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.updateBlockForPipeline(NameNodeRpcServer.java:930)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolServerSideTranslatorPB.java:966)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2345)

at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554)
at org.apache.hadoop.ipc.Client.call(Client.java:1498)
at org.apache.hadoop.ipc.Client.call(Client.java:1398)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
at com.sun.proxy.$Proxy16.updateBlockForPipeline(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolTranslatorPB.java:948)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185)
at com.sun.proxy.$Proxy17.updateBlockForPipeline(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:283)
at com.sun.proxy.$Proxy18.updateBlockForPipeline(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1281)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:993)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:500)
2019-11-20 03:50:42,534 INFO [pool-4-thread-1] regionserver.ShutdownHook: Shutdown hook finished.

2.gc日志:

2019-11-20T03:32:24.081+0800: 117381.659: [GC (Allocation Failure) 2019-11-20T03:32:24.082+0800: 117381.659: [ParNew: 1445776K->120613K(1504064K), 21.4794441 secs] 3237704K->1942467K(8221504K), 21.4797147 secs] [Times: user=59.65 sys=0.34, real=21.48 secs]
2019-11-20T03:33:13.014+0800: 117430.592: [GC (Allocation Failure) 2019-11-20T03:33:13.037+0800: 117430.614: [ParNew: 1457573K->117558K(1504064K), 3.8057408 secs] 3279427K->1954686K(8221504K), 3.8282921 secs] [Times: user=13.91 sys=0.05, real=3.83 secs]
2019-11-20T03:33:21.314+0800: 117438.892: [GC (Allocation Failure) 2019-11-20T03:33:21.314+0800: 117438.892: [ParNew: 1454518K->94887K(1504064K), 7.5804580 secs] 3291646K->1948109K(8221504K), 7.5807123 secs] [Times: user=7.80 sys=0.08, real=7.58 secs]
2019-11-20T03:42:44.684+0800: 118002.262: [GC (Allocation Failure) 2019-11-20T03:42:44.790+0800: 118002.368: [ParNew: 1431847K->167104K(1504064K), 16.3961827 secs] 3285069K->2024750K(8221504K), 16.5019800 secs] [Times: user=16.41 sys=0.18, real=16.50 secs]
2019-11-20T03:47:36.965+0800: 118294.543: [GC (Allocation Failure) 2019-11-20T03:47:38.419+0800: 118295.997: [ParNew: 1504064K->106763K(1504064K), 120.6296245 secs] 3361710K->2097638K(8221504K), 122.0840424 secs] [Times: user=171.54 sys=2.40, real=122.06 secs]
2019-11-20T03:50:38.743+0800: 118476.321: [GC (Allocation Failure) 2019-11-20T03:50:38.743+0800: 118476.321: [ParNew: 1443723K->33648K(1504064K), 0.1142207 secs] 3434598K->2024523K(8221504K), 0.1143976 secs] [Times: user=0.39 sys=0.00, real=0.11 secs]
Heap
par new generation total 1504064K, used 1150286K [0x00000005c0000000, 0x0000000626000000, 0x0000000626000000)
eden space 1336960K, 83% used [0x00000005c0000000, 0x0000000604277878, 0x00000006119a0000)
from space 167104K, 20% used [0x00000006119a0000, 0x0000000613a7c1b8, 0x000000061bcd0000)
to space 167104K, 0% used [0x000000061bcd0000, 0x000000061bcd0000, 0x0000000626000000)
concurrent mark-sweep generation total 6717440K, used 1990874K [0x0000000626000000, 0x00000007c0000000, 0x00000007c0000000)
Metaspace used 166258K, capacity 181568K, committed 181636K, reserved 1206272K
class space used 20691K, capacity 24674K, committed 24740K, reserved 1048576K

3.原因分析:

在03:47分时,程序进行了一次GC,并且耗时比较长,达到了122秒。GC过程中程序是停止的,称之为“stop the world”。而zk的超时时间是120秒,GC结束后,发现zk连接超时了,region master已经认为它挂掉,把它从集群服务里剔除了,让其它regionserver负它的工作。接替的regionserver会读取wal进行恢复工作,并继续处理,完成后删除wal文件。从GC恢复过来的regionserver,发现找不到wal了,所以报“wal.FSHLog: Error syncing, request close of WAL”,并且得知自己被集群剔除了,就主动关闭自已。

4.解决方案:

regionserver的垃圾回收改成G1,zk的超时120秒已经够长了,就不调整。

### 关于HBase 2.1版本中挂起进程的操作方法或解决方案 在处理HBase 2.1版本中的挂起进程问题时,通常需要从以下几个方面入手分析并解决: #### 1. **日志排查** 日志是诊断HBase进程中出现问题的重要工具。可以通过查看`logs/hbase-<username>-master-.log`和`logs/hbase-<username>-regionserver-.log`来定位具体原因[^1]。如果发现某些RegionServer或Master长时间未响应,则可能是由于资源不足或其他异常引起。 #### 2. **JVM线程Dump** 当怀疑某个进程被挂起时,可以使用以下命令获取当前Java虚拟机的线程状态: ```bash jstack <pid> ``` 将输出保存至文件以便后续分析。通过观察线程堆栈信息,能够判断是否存在死锁或者阻塞等待的情况[^2]。 #### 3. **调整超时参数** 如果确认是因为网络延迟等原因造成连接超时而导致流程停滞不前的话,适当增加客户端配置项如zookeeper.session.timeout.ms 和 rpc.timeout 等时间长度可能会有所帮助。 修改方式如下所示,在conf目录下的hbase-site.xml添加/修改对应属性值: ```xml <property> <name>hbase.client.operation.timeout</name> <value>60000</value><!-- 单位毫秒 --> </property> <property> <name>hbase.rpc.timeout</name> <value>30000</value> </property> ``` #### 4. **清理僵尸ZNode节点** ZooKeeper管理着整个集群的状态信息,当部分服务器意外退出而未能及时通知给其他成员知晓的时候就会遗留下来所谓的“幽灵”记录。这些残留的数据可能干扰正常通信机制从而引发卡顿现象。因此建议定期执行zkCli.sh脚本来手动清除不必要的路径条目[^1]。 #### 5. **重启服务组件** 对于无法快速修复的问题来说,最简单粗暴但也有效的方法就是停止再重新启动关联的服务单元(例如 Master 或 RegionServers)。不过在此之前最好先备份好重要资料以防万一丢失不可恢复的内容。 --- ### 提供一段示例代码用于检测是否有僵滞的任务存在 下面给出了一段基于Shell Script形式编写的小程序用来辅助监控系统健康状况: ```bash #!/bin/bash # 设置环境变量 export HBASE_HOME=/path/to/hbase source $HBASE_HOME/bin/hbase-config.sh echo "Checking status of all regionservers..." $HBASE_HOME/bin/hbase hbck | grep -i 'offline\|split' && echo "[WARN] Found issues with region distribution." || true; for rs in $($HBASE_HOME/bin/hbase org.apache.hadoop.hbase.util.HBaseConfTool hbase.regionserver.info.port); do curl --silent http://localhost:$rs/jmx | jq '.beans[] | select(.name=="Hadoop:service=HBase,name=RegionServer,sub=Regions")|.numOnlineRegions' done; ``` 此脚本会打印出每一个在线Region Server所承载分区数量,并且还会检查是否存在离线区段等问题。 ---
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值