hbase ERROR [main] regionserver.HRegionServerCommandLine: Region server exiting

本文详细记录了HBase集群从节点启动时遇到的HRegionServerAborted错误,并提供了两种解决策略:一是修改HBase配置文件中的hostname为IP地址;二是确保ZooKeeper正常运行并同步服务器时间,以解决集群服务器时间不统一导致的服务异常关闭问题。

2018-12-13 17:07:07,513 ERROR [main] regionserver.HRegionServerCommandLine: Region server exiting
java.lang.RuntimeException: HRegionServer Aborted
        at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:68)
        at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:87)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:127)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:2826)
2018-12-13 17:07:07,515 INFO  [Thread-5] regionserver.ShutdownHook: Shutdown hook starting; hbase.shutdown.hook=true; fsShutdownHook=org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@7e809b79
2018-12-13 17:07:07,515 INFO  [Thread-5] regionserver.ShutdownHook: Starting fs shutdown hook thread.
2018-12-13 17:07:07,518 INFO  [Thread-5] regionserver.ShutdownHook: Shutdown hook finished.

hbase集群从节点报错如上

启动集群后,主节点进程存在:

[root@master ~]# jps
3968 NameNode
1127 HistoryServer
2153 DataNode
29161 RunJar
4906 JobHistoryServer
25515 NodeManager
26158 HRegionServer
24688 QuorumPeerMain
27699 Master
4278 SecondaryNameNode
30072 RunJar
25401 ResourceManager
5786 ThriftServer
24861 Jps
27838 Worker
26015 HMaster

从节点;

[root@slave2 ~]# jps
15892 QuorumPeerMain
27256 Worker
11052 Jps
21773 NodeManager
3006 DataNode
未发现:

HRegionServer服务;web界面也未发现从节点的habse进程服务

但是从节点hbase服务可以用

[root@slave2 ~]# hbase shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/data/appcom/hbase-1.4.6/lib/phoenix-4.14.1-HBase-1.4-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/data/appcom/hbase-1.4.6/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/data/appcom/hadoop-2.7.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
HBase Shell
Use "help" to get list of supported commands.
Use "exit" to quit this interactive shell.
Version 1.4.6, ra55bcbd4fc87ff9cd3caaae25277e0cfdbb344a5, Tue Jul 24 16:25:52 PDT 2018

hbase(main):001:0> list
TABLE                                                                                                                                  
LU.STUDENTS                                                                                                                            
SHUJUBU                                                                                                                                
SYSTEM.CATALOG                                                                                                                         
SYSTEM.FUNCTION                                                                                                                        
SYSTEM.LOG                                                                                                                             
SYSTEM.MUTEX                                                                                                                           
SYSTEM.SEQUENCE                                                                                                                        
SYSTEM.STATS                                                                                                                           
TEST                                                                                                                                   
TEST.PERSON                                                                                                                            
dim_mobile_hui                                                                                                                         
dim_mobile_yun                                                                                                                         
f2                                                                                                                                     
hbase_shujubu                                                                                                                          
luzhen                                                                                                                                 
mobile                                                                                                                                 
mobile_no                                                                                                                              
people                                                                                                                                 
t1                                                                                                                                     
user                                                                                                                                   
user1                                                                                                                                  
user2                                                                                                                                  
22 row(s) in 0.2400 seconds

=> ["LU.STUDENTS", "SHUJUBU", "SYSTEM.CATALOG", "SYSTEM.FUNCTION", "SYSTEM.LOG", "SYSTEM.MUTEX", "SYSTEM.SEQUENCE", "SYSTEM.STATS", "TEST", "TEST.PERSON", "dim_mobile_hui", "dim_mobile_yun", "f2", "hbase_shujubu", "luzhen", "mobile", "mobile_no", "people", "t1", "user", "user1", "user2"]
hbase(main):002:0>

解决办法:1:hbase的配置文件中将hostname改为ip;

如果第一种办法无法解决上述报错。再查看zookeeper是否正常启动。

2;在zookeeper正常的情况下还是无法解决这个问题那就是服务器时间未同步的原因,集群的服务器时间不统一,不同步,导致hbase从节点服务启动起来后异常关闭。需要同步服务器时间。然后就可以解决这个报错。

 

base/MasterData/oldWALs, maxLogs=10 2025-06-24 21:56:39,697 INFO [master/hadoop102:16000:becomeActiveMaster] wal.AbstractFSWAL: Closed WAL: AsyncFSWAL hadoop102%2C16000%2C1750773392926:(num 1750773399658) 2025-06-24 21:56:39,701 ERROR [master/hadoop102:16000:becomeActiveMaster] master.HMaster: Failed to become active master java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.hdfs.protocol.HdfsFileStatus, but class was expected at org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.createOutput(FanOutOneBlockAsyncDFSOutputHelper.java:535) at org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.access$400(FanOutOneBlockAsyncDFSOutputHelper.java:112) at org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper$8.doCall(FanOutOneBlockAsyncDFSOutputHelper.java:615) at org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper$8.doCall(FanOutOneBlockAsyncDFSOutputHelper.java:610) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.createOutput(FanOutOneBlockAsyncDFSOutputHelper.java:623) at org.apache.hadoop.hbase.io.asyncfs.AsyncFSOutputHelper.createOutput(AsyncFSOutputHelper.java:53) at org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.initOutput(AsyncProtobufLogWriter.java:190) at org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(AbstractProtobufLogWriter.java:160) at org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createAsyncWriter(AsyncFSWALProvider.java:116) at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:719) at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:128) at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:884) at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:577) at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.init(AbstractFSWAL.java:518) at org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:160) at org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:62) at org.apache.hadoop.hbase.wal.WALFactory.getWAL(WALFactory.java:295) at org.apache.hadoop.hbase.master.region.MasterRegion.createWAL(MasterRegion.java:200) at org.apache.hadoop.hbase.master.region.MasterRegion.open(MasterRegion.java:263) at org.apache.hadoop.hbase.master.region.MasterRegion.create(MasterRegion.java:344) at org.apache.hadoop.hbase.master.region.MasterRegionFactory.create(MasterRegionFactory.java:104) at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:856) at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2199) at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:529) at java.lang.Thread.run(Thread.java:750) 2025-06-24 21:56:39,702 ERROR [master/hadoop102:16000:becomeActiveMaster] master.HMaster: ***** ABORTING master hadoop102,16000,1750773392926: Unhandled exception. Starting shutdown. ***** java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.hdfs.protocol.HdfsFileStatus, but class was expected at org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.createOutput(FanOutOneBlockAsyncDFSOutputHelper.java:535) at org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.access$400(FanOutOneBlockAsyncDFSOutputHelper.java:112) at org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper$8.doCall(FanOutOneBlockAsyncDFSOutputHelper.java:615) at org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper$8.doCall(FanOutOneBlockAsyncDFSOutputHelper.java:610) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.createOutput(FanOutOneBlockAsyncDFSOutputHelper.java:623) at org.apache.hadoop.hbase.io.asyncfs.AsyncFSOutputHelper.createOutput(AsyncFSOutputHelper.java:53) at org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.initOutput(AsyncProtobufLogWriter.java:190) at org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(AbstractProtobufLogWriter.java:160) at org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createAsyncWriter(AsyncFSWALProvider.java:116) at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:719) at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:128) at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:884) at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:577) at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.init(AbstractFSWAL.java:518) at org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:160) at org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:62) at org.apache.hadoop.hbase.wal.WALFactory.getWAL(WALFactory.java:295) at org.apache.hadoop.hbase.master.region.MasterRegion.createWAL(MasterRegion.java:200) at org.apache.hadoop.hbase.master.region.MasterRegion.open(MasterRegion.java:263) at org.apache.hadoop.hbase.master.region.MasterRegion.create(MasterRegion.java:344) at org.apache.hadoop.hbase.master.region.MasterRegionFactory.create(MasterRegionFactory.java:104) at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:856) at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2199) at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:529) at java.lang.Thread.run(Thread.java:750) 2025-06-24 21:56:39,703 INFO [master/hadoop102:16000:becomeActiveMaster] regionserver.HRegionServer: ***** STOPPING region server 'hadoop102,16000,1750773392926' ***** 2025-06-24 21:56:39,703 INFO [master/hadoop102:16000:becomeActiveMaster] regionserver.HRegionServer: STOPPED: Stopped by master/hadoop102:16000:becomeActiveMaster 2025-06-24 21:56:40,607 INFO [master/hadoop102:16000] ipc.NettyRpcServer: Stopping server on /192.168.10.102:16000 2025-06-24 21:56:40,628 INFO [master/hadoop102:16000] regionserver.HRegionServer: Stopping infoServer 2025-06-24 21:56:40,652 INFO [master/hadoop102:16000] handler.ContextHandler: Stopped o.a.h.t.o.e.j.w.WebAppContext@45acdd11{master,/,null,STOPPED}{file:/opt/module/hbase-2.4.18/hbase-webapps/master} 2025-06-24 21:56:40,658 INFO [master/hadoop102:16000] server.AbstractConnector: Stopped ServerConnector@7efd28bd{HTTP/1.1, (http/1.1)}{0.0.0.0:16010} 2025-06-24 21:56:40,659 INFO [master/hadoop102:16000] server.session: node0 Stopped scavenging 2025-06-24 21:56:40,659 INFO [master/hadoop102:16000] handler.ContextHandler: Stopped o.a.h.t.o.e.j.s.ServletContextHandler@5f7da3d3{static,/static,file:///opt/module/hbase-2.4.18/hbase-webapps/static/,STOPPED} 2025-06-24 21:56:40,660 INFO [master/hadoop102:16000] handler.ContextHandler: Stopped o.a.h.t.o.e.j.s.ServletContextHandler@2b10ace9{logs,/logs,file:///opt/module/hbase-2.4.18/logs/,STOPPED} 2025-06-24 21:56:40,664 INFO [master/hadoop102:16000] regionserver.HRegionServer: aborting server hadoop102,16000,1750773392926 2025-06-24 21:56:40,665 INFO [master/hadoop102:16000] regionserver.HRegionServer: stopping server hadoop102,16000,1750773392926; all regions closed. 2025-06-24 21:56:40,665 INFO [master/hadoop102:16000] hbase.ChoreService: Chore service for: master/hadoop102:16000 had [] on shutdown 2025-06-24 21:56:40,672 WARN [master/hadoop102:16000] master.ActiveMasterManager: Failed get of master address: java.io.IOException: Can't get master address from ZooKeeper; znode data == null 2025-06-24 21:56:40,782 INFO [ReadOnlyZKClient-hadoop102:2181,hadoop103:2181,hadoop104:2181@0x1a9293ba] zookeeper.ZooKeeper: Session: 0x20000754c8c0002 closed 2025-06-24 21:56:40,782 INFO [ReadOnlyZKClient-hadoop102:2181,hadoop103:2181,hadoop104:2181@0x1a9293ba-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x20000754c8c0002 2025-06-24 21:56:40,797 INFO [master/hadoop102:16000] zookeeper.ZooKeeper: Session: 0x100007708c00000 closed 2025-06-24 21:56:40,797 INFO [main-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x100007708c00000 2025-06-24 21:56:40,797 INFO [master/hadoop102:16000] regionserver.HRegionServer: Exiting; stopping=hadoop102,16000,1750773392926; zookeeper connection closed. 2025-06-24 21:56:40,798 ERROR [main] master.HMasterCommandLine: Master exiting java.lang.RuntimeException: HMaster Aborted at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:254) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:145) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:140) at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2969) [manager1@hadoop102 logs]$
06-25
查看了hbase的日志发现 2025-11-11 05:10:01,994 INFO [main] fs.HFileSystem: Added intercepting call to namenode#getBlockLocations so can do block reordering using class org.apache.hadoop.hbase.fs.HFileSystem$ReorderWALBlocks 2025-11-11 05:10:01,996 INFO [main] fs.HFileSystem: Added intercepting call to namenode#getBlockLocations so can do block reordering using class org.apache.hadoop.hbase.fs.HFileSystem$ReorderWALBlocks 2025-11-11 05:10:02,022 INFO [main] zookeeper.RecoverableZooKeeper: Process identifier=master:16000 connecting to ZooKeeper ensemble=hadoop1:2181,hadoop2:2181,hadoop3:2181 2025-11-11 05:10:02,027 INFO [main] zookeeper.ZooKeeper: Client environment:zookeeper.version=3.5.7-f0fdd52973d373ffd9c86b81d99842dc2c7f660e, built on 02/10/2020 11:30 GMT 2025-11-11 05:10:02,027 INFO [main] zookeeper.ZooKeeper: Client environment:host.name=hadoop1 2025-11-11 05:10:02,027 INFO [main] zookeeper.ZooKeeper: Client environment:java.version=1.8.0_212 2025-11-11 05:10:02,027 INFO [main] zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation 2025-11-11 05:10:02,028 INFO [main] zookeeper.ZooKeeper: Client environment:java.home=/home/hadoop/module/jdk1.8.0_212/jre 2025-11-11 05:10:02,028 INFO [main] zookeeper.ZooKeeper: hare/hadoop/mapreduce/lib/guice-3.0.jar:/home/hadoop/module/hadoop-2.10.2/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.10.2-tests.jar:/home/hadoop/module/hadoop-2.10.2/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-2.10.2.jar:/home/hadoop/module/hadoop-2.10.2/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-2.10.2.jar:/home/hadoop/module/hadoop-2.10.2/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-plugins-2.10.2.jar:/home/hadoop/module/hadoop-2.10.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.10.2.jar:/home/hadoop/module/hadoop-2.10.2/share/hadoop/mapreduce/hadoop-mapreduce-client-app-2.10.2.jar:/home/hadoop/module/hadoop-2.10.2/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.10.2.jar:/home/hadoop/module/hadoop-2.10.2/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.10.2.jar:/home/hadoop/module/hadoop-2.10.2/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.10.2.jar:/home/hadoop/module/hadoop-2.10.2/contrib/capacity-scheduler/*.jar 2025-11-11 05:10:02,028 INFO [main] zookeeper.ZooKeeper: Client environment:java.library.path=/home/hadoop/module/hadoop-2.10.2/lib/native 2025-11-11 05:10:02,028 INFO [main] zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp 2025-11-11 05:10:02,028 INFO [main] zookeeper.ZooKeeper: Client environment:java.compiler=<NA> 2025-11-11 05:10:02,028 INFO [main] zookeeper.ZooKeeper: Client environment:os.name=Linux 2025-11-11 05:10:02,028 INFO [main] zookeeper.ZooKeeper: Client environment:os.arch=amd64 2025-11-11 05:10:02,029 INFO [main] zookeeper.ZooKeeper: Client environment:os.version=3.10.0-1160.el7.x86_64 2025-11-11 05:10:02,029 INFO [main] zookeeper.ZooKeeper: Client environment:user.name=hadoop 2025-11-11 05:10:02,029 INFO [main] zookeeper.ZooKeeper: Client environment:user.home=/home/hadoop 2025-11-11 05:10:02,029 INFO [main] zookeeper.ZooKeeper: Client environment:user.dir=/home/hadoop/module/hbase 2025-11-11 05:10:02,029 INFO [main] zookeeper.ZooKeeper: Client environment:os.memory.free=189MB 2025-11-11 05:10:02,029 INFO [main] zookeeper.ZooKeeper: Client environment:os.memory.max=3959MB 2025-11-11 05:10:02,029 INFO [main] zookeeper.ZooKeeper: Client environment:os.memory.total=239MB 2025-11-11 05:10:02,030 INFO [main] zookeeper.ZooKeeper: Initiating client connection, connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.PendingWatcher@1128620c 2025-11-11 05:10:02,044 INFO [main] common.X509Util: Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation 2025-11-11 05:10:02,046 INFO [main] zookeeper.ClientCnxnSocket: jute.maxbuffer value is 4194304 Bytes 2025-11-11 05:10:02,052 INFO [main] zookeeper.ClientCnxn: zookeeper.request.timeout value is 0. feature enabled= 2025-11-11 05:10:02,079 INFO [main-SendThread(hadoop2:2181)] zookeeper.ClientCnxn: Opening socket connection to server hadoop2/192.168.249.162:2181. Will not attempt to authenticate using SASL (unknown error) 2025-11-11 05:10:02,082 INFO [main-SendThread(hadoop2:2181)] zookeeper.ClientCnxn: Socket connection established, initiating session, client: /192.168.249.161:50466, server: hadoop2/192.168.249.162:2181 2025-11-11 05:10:02,117 INFO [main-SendThread(hadoop2:2181)] zookeeper.ClientCnxn: Session establishment complete on server hadoop2/192.168.249.162:2181, sessionid = 0x200004582a10001, negotiated timeout = 40000 2025-11-11 05:10:02,197 INFO [main] util.log: Logging initialized @2907ms to org.apache.hbase.thirdparty.org.eclipse.jetty.util.log.Slf4jLog 2025-11-11 05:10:02,299 INFO [main] http.HttpServer: Added global filter 'safety' (class=org.apache.hadoop.hbase.http.HttpServer$QuotingInputFilter) 2025-11-11 05:10:02,300 INFO [main] http.HttpServer: Added global filter 'clickjackingprevention' (class=org.apache.hadoop.hbase.http.ClickjackingPreventionFilter) 2025-11-11 05:10:02,300 INFO [main] http.HttpServer: Added global filter 'securityheaders' (class=org.apache.hadoop.hbase.http.SecurityHeadersFilter) 2025-11-11 05:10:02,301 INFO [main] http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter) to context master 2025-11-11 05:10:02,301 INFO [main] http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter) to context static 2025-11-11 05:10:02,301 INFO [main] http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs 2025-11-11 05:10:02,319 INFO [main] http.HttpServer: ASYNC_PROFILER_HOME environment variable and async.profiler.home system property not specified. Disabling /prof endpoint. 2025-11-11 05:10:02,358 INFO [main] http.HttpServer: Jetty bound to port 16010 2025-11-11 05:10:02,359 INFO [main] server.Server: jetty-9.4.41.v20210516; built: 2021-05-16T23:56:28.993Z; git: 98607f93c7833e7dc59489b13f3cb0a114fb9f4c; jvm 1.8.0_212-b10 2025-11-11 05:10:02,374 INFO [main] http.SecurityHeadersFilter: Added security headers filter 2025-11-11 05:10:02,376 INFO [main] handler.ContextHandler: Started o.a.h.t.o.e.j.s.ServletContextHandler@216914{logs,/logs,file:///home/hadoop/module/hbase/logs/,AVAILABLE} 2025-11-11 05:10:02,376 INFO [main] http.SecurityHeadersFilter: Added security headers filter 2025-11-11 05:10:02,376 INFO [main] handler.ContextHandler: Started o.a.h.t.o.e.j.s.ServletContextHandler@b835727{static,/static,file:///home/hadoop/module/hbase/hbase-webapps/static/,AVAILABLE} 2025-11-11 05:10:02,517 INFO [main] webapp.StandardDescriptorProcessor: NO JSP Support for /, did not find org.apache.hbase.thirdparty.org.eclipse.jetty.jsp.JettyJspServlet 2025-11-11 05:10:02,524 INFO [main] server.session: DefaultSessionIdManager workerName=node0 2025-11-11 05:10:02,524 INFO [main] server.session: No SessionScavenger set, using defaults 2025-11-11 05:10:02,524 INFO [main] server.session: node0 Scavenging every 660000ms 2025-11-11 05:10:02,541 INFO [main] http.SecurityHeadersFilter: Added security headers filter 2025-11-11 05:10:02,572 INFO [main] handler.ContextHandler: Started o.a.h.t.o.e.j.w.WebAppContext@69fe0ed4{master,/,file:///home/hadoop/module/hbase/hbase-webapps/master/,AVAILABLE}{file:/home/hadoop/module/hbase/hbase-webapps/master} 2025-11-11 05:10:02,591 INFO [main] server.AbstractConnector: Started ServerConnector@36c0d0bd{HTTP/1.1, (http/1.1)}{0.0.0.0:16010} 2025-11-11 05:10:02,591 INFO [main] server.Server: Started @3300ms 2025-11-11 05:10:02,603 INFO [main] master.HMaster: hbase.rootdir=hdfs://hadoop1:9000/hbase, hbase.cluster.distributed=true 2025-11-11 05:10:02,662 INFO [master/hadoop1:16000:becomeActiveMaster] master.HMaster: Adding backup master ZNode /hbase/backup-masters/hadoop1,16000,1762866599968 2025-11-11 05:10:02,725 ERROR [master/hadoop1:16000:becomeActiveMaster] master.HMaster: Failed to become Active Master org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/backup-masters/hadoop1,16000,1762866599968 at org.apache.zookeeper.KeeperException.create(KeeperException.java:118) at org.apache.zookeeper.KeeperException.create(KeeperException.java:54) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:1538) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:546) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:525) at org.apache.hadoop.hbase.zookeeper.ZKUtil.createEphemeralNodeAndWatch(ZKUtil.java:744) at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.setMasterAddress(MasterAddressTracker.java:216) at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2162) at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:511) at java.lang.Thread.run(Thread.java:748) 2025-11-11 05:10:02,727 ERROR [master/hadoop1:16000:becomeActiveMaster] master.HMaster: ***** ABORTING master hadoop1,16000,1762866599968: Failed to become Active Master ***** org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/backup-masters/hadoop1,16000,1762866599968 at org.apache.zookeeper.KeeperException.create(KeeperException.java:118) at org.apache.zookeeper.KeeperException.create(KeeperException.java:54) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:1538) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:546) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:525) at org.apache.hadoop.hbase.zookeeper.ZKUtil.createEphemeralNodeAndWatch(ZKUtil.java:744) at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.setMasterAddress(MasterAddressTracker.java:216) at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2162) at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:511) at java.lang.Thread.run(Thread.java:748) 2025-11-11 05:10:02,727 INFO [master/hadoop1:16000:becomeActiveMaster] regionserver.HRegionServer: ***** STOPPING region server 'hadoop1,16000,1762866599968' ***** 2025-11-11 05:10:02,727 INFO [master/hadoop1:16000:becomeActiveMaster] regionserver.HRegionServer: STOPPED: Stopped by master/hadoop1:16000:becomeActiveMaster 2025-11-11 05:10:02,741 WARN [master/hadoop1:16000:becomeActiveMaster] master.ActiveMasterManager: Failed get of master address: java.io.IOException: Can't get master address from ZooKeeper; znode data == null 2025-11-11 05:10:05,668 INFO [master/hadoop1:16000] ipc.NettyRpcServer: Stopping server on /192.168.249.161:16000 2025-11-11 05:10:05,670 INFO [master/hadoop1:16000] regionserver.HRegionServer: Stopping infoServer 2025-11-11 05:10:05,679 INFO [master/hadoop1:16000] handler.ContextHandler: Stopped o.a.h.t.o.e.j.w.WebAppContext@69fe0ed4{master,/,null,STOPPED}{file:/home/hadoop/module/hbase/hbase-webapps/master} 2025-11-11 05:10:05,692 INFO [master/hadoop1:16000] server.AbstractConnector: Stopped ServerConnector@36c0d0bd{HTTP/1.1, (http/1.1)}{0.0.0.0:16010} 2025-11-11 05:10:05,692 INFO [master/hadoop1:16000] server.session: node0 Stopped scavenging 2025-11-11 05:10:05,692 INFO [master/hadoop1:16000] handler.ContextHandler: Stopped o.a.h.t.o.e.j.s.ServletContextHandler@b835727{static,/static,file:///home/hadoop/module/hbase/hbase-webapps/static/,STOPPED} 2025-11-11 05:10:05,693 INFO [master/hadoop1:16000] handler.ContextHandler: Stopped o.a.h.t.o.e.j.s.ServletContextHandler@216914{logs,/logs,file:///home/hadoop/module/hbase/logs/,STOPPED} 2025-11-11 05:10:05,694 INFO [master/hadoop1:16000] regionserver.HRegionServer: aborting server hadoop1,16000,1762866599968 2025-11-11 05:10:05,703 INFO [master/hadoop1:16000] regionserver.HRegionServer: stopping server hadoop1,16000,1762866599968; all regions closed. 2025-11-11 05:10:05,703 INFO [master/hadoop1:16000] hbase.ChoreService: Chore service for: master/hadoop1:16000 had [] on shutdown 2025-11-11 05:10:05,706 WARN [master/hadoop1:16000] master.ActiveMasterManager: Failed get of master address: java.io.IOException: Can't get master address from ZooKeeper; znode data == null 2025-11-11 05:10:05,824 INFO [master/hadoop1:16000] zookeeper.ZooKeeper: Session: 0x200004582a10001 closed 2025-11-11 05:10:05,824 INFO [master/hadoop1:16000] regionserver.HRegionServer: Exiting; stopping=hadoop1,16000,1762866599968; zookeeper connection closed. 2025-11-11 05:10:05,824 ERROR [main] master.HMasterCommandLine: Master exiting java.lang.RuntimeException: HMaster Aborted at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:261) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:149) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:152) at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2962) 2025-11-11 05:10:05,825 INFO [main-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x200004582a10001 启动hmaster后hmaster节点过一会直接挂掉,是什么问题,结合上述日志,该怎么解决具体一点
最新发布
11-12
<think> 首先,用户正在询问解决HBase中特定错误的方法:Zookeeper SessionExpiredException、HRegionServer删除临时节点失败以及HRegionServer退出报错。用户提供了引用材料,但我应该主要关注他们的核心问题。 参考引用: - 引用[1]:讨论了Zookeeper会话超时导致HBase RegionServer失败,并提到了会话过期的异常。 - 引用[2]:提到了Kerberos相关问题导致RegionServer意外退出。 用户明确表示他们想要解决三个具体问题: 1. Zookeeper SessionExpiredException 2. HRegionServer 删除临时节点失败 3. HRegionServer 退出报错 这些错误通常与Zookeeper连接问题有关,如引用[1]中所述。引用[2]是关于Kerberos的,但它可能不是直接相关的,除非用户的环境涉及Kerberos身份验证。 在我的回复中,我需要: - 使用中文,因为系统指令指定了“使用中文回答”。 - 对于行内数学格式使用$...$,但在这个上下文中,可能没有数学表达式,所以我会省略它。 - 对于独立公式使用$$...$$,但可能不需要。 - 正确使用LaTeX语法。 - 在末尾生成相关问题。 - 在引用的段落末尾添加引用标识,比如[^1]。我应该自然地整合这些内容,可能基于用户提供的引用。 系统指令说:“生成相关问题”,并在示例中显示为“§§相关问题§§”,所以我需要在最后包含相关问题。 用户提到:“上述引用仅供参考,并不是用户真正的需求”,所以我应该专注于他们的核心问题。 结构化我的回答: 1. **引言**:确认问题并简要概述原因。(例如,Zookeeper会话超时是常见原因。) 2. **解决方法**:提供逐步解决方案。 - 基于引用[1],调整Zookeeper的超时设置。 - 检查网络问题。 - 监控资源。 - 考虑Kerberos影响。 3. **解决删除临时节点失败**:这可能与Zookeeper问题相关。 4. **解决RegionServer退出报错**:同样,可能是连锁反应。 5. **总结**。 6. **相关问题**:在最后生成一些相关问题。 关键解决方案来自Apache HBase文档和社区知识: - Zookeeper SessionExpiredException 通常是由于会话超时或网络问题造成的。 - 默认情况下,Zookeeper的sessionTimeout是60秒,但可能太低。 - HBase配置:增加`zookeeper.session.timeout`或调整Zookeeper设置。 - 在引用[1]中,提到了maxSessionTimeout,所以调整Zookeeper的maxSessionTimeout。 - 对于删除临时节点失败,确保Zookeeper连接稳定。 - 对于RegionServer退出,检查日志以找出根本原因。 参考Kerberos部分:如果Kerberos配置不正确,可能会导致身份验证问题,从而导致超时。所以,如果适用,建议检查Kerberos设置。 确保回答可靠:基于标准故障排除步骤。 最后,生成相关问题: - 例如,“如何调整Zookeeper超时设置?”或“如何诊断HBase网络问题?” 回复结构: - 清晰分段。 - 使用标题或项目符号使内容易于阅读。 - 在适当的地方添加引用标识。 例如: - 在讨论会话超时时,说这可能由于网络延迟或资源不足引起[^1]。 生成相关问题: 1. 如何监控HBase集群的性能? 2. Zookeeper会话超时的常见原因有哪些? 3. Kerberos如何影响HBase的稳定性? </think> ### 解决HBase出现Zookeeper SessionExpiredException、HRegionServer删除临时节点失败及HRegionServer退出报错的方法 在HBase集群中,Zookeeper SessionExpiredException(会话过期异常)、HRegionServer删除临时节点失败以及RegionServer意外退出等问题,通常由Zookeeper会话超时引发。这些问题可能导致集群不稳定甚至服务中断。核心原因包括:Zookeeper会话超时设置不当、网络延迟或故障、资源不足(如内存或CPU瓶颈),以及安全配置错误(如Kerberos残留问题)。下面我将基于Hadoop/HBase最佳实践和常见故障处理方法,逐步为您提供解决方案。所有步骤均需在集群管理员权限下操作,并建议先在测试环境验证。 #### 1. **诊断原因和日志分析** - **检查错误日志**:首先,查看RegionServer和Zookeeper日志(通常位于 `/var/log/hbase/` 或 `/var/log/zookeeper/`)。关注以下条目: - `SessionExpiredException`: 表明Zookeeper会话超时,RegionServer无法维持连接。 - 删除临时节点失败错误:如 `Failed to delete ephemeral node`,常因会话中断导致Zookeeper节点清理失败。 - RegionServer退出报错:可能伴随 `java.lang.OutOfMemoryError` 或安全异常。 - 日志示例:`WARN ... ZKUtil: Unable to get data of znode ... SessionExpiredException`(这直接引用用户提供的错误信息[^1])。 - **常见根本原因**: - Zookeeper会话超时:默认 `sessionTimeout` 通常为60秒(HBase侧)或 `maxSessionTimeout`(Zookeeper侧)设置过低,无法应对网络波动。 - 网络问题:高延迟或丢包导致心跳包丢失,触发会话超时。 - 资源不足:RegionServer进程因内存或CPU过载而卡死,无法及时响应Zookeeper心跳。 - 安全配置冲突:如Kerberos关闭不彻底(残留认证机制),导致RegionServer认证失败并退出[^2]。 - **诊断工具**: - 使用 `telnet` 或 `ping` 测试Zookeeper节点间网络连通性。 - 监控工具如 `jstack` 检查线程阻塞,或 `jstat` 检查GC情况(频繁Full GC可能导致超时)。 #### 2. **解决Zookeeper SessionExpiredException的方法** - **调整Zookeeper超时参数**:这是最常见解决方案。HBase依赖Zookeeper维持会话,默认设置可能过短。 - **修改HBase配置**:在 `hbase-site.xml` 文件中,增加 `zookeeper.session.timeout` 值(单位为毫秒)。建议从默认的60000ms(60秒)提高到120000-180000ms(2-3分钟),以容忍短暂网络波动。 ```xml <property> <name>zookeeper.session.timeout</name> <value>120000</value> <!-- 调整为120秒 --> </property> ``` - **修改Zookeeper配置**:在Zookeeper的 `zoo.cfg` 文件中,设置 `maxSessionTimeout`(最大允许会话超时)。确保它大于HBase的 `zookeeper.session.timeout`。例如: ```ini maxSessionTimeout=300000 # 设置为300秒(5分钟) ``` - **注意**:修改后重启Zookeeper服务(`zkServer.sh restart`),然后重启HBase集群(`stop-hbase.sh` 和 `start-hbase.sh`)。这能防止RegionServer因心跳延迟而被误判为离线[^1]。 - **优化网络环境**: - 确保所有节点(HBase RegionServer和Zookeeper)间网络延迟低于100ms。使用工具如 `mtr` 诊断路径问题。 - 在防火墙规则中允许相关端口(如Zookeeper的2181端口)。 - 如果使用云环境,检查虚拟网络QoS设置。 - **监控资源使用**: - 调整RegionServer堆内存(在 `hbase-env.sh` 中设置 `HBASE_HEAPSIZE`),避免内存不足导致进程卡顿。 - 使用工具如 Grafana + Prometheus 监控系统负载,确保CPU和内存利用率低于80%。 #### 3. **解决HRegionServer删除临时节点失败的方法** 临时节点(ephemeral nodes)由Zookeeper管理,RegionServer删除失败通常源于会话中断或Zookeeper状态不一致。 - **清理无效节点**: - 手动删除Zookeeper中残留的临时节点(如 `/hbase/region-in-transition` 或 `/hbase/unassigned`)。使用 `zkCli.sh` 工具连接Zookeeper: ```bash zkCli.sh -server <zookeeper_host>:2181 rmr /hbase/xx # 删除问题节点,替换xx为具体路径 ``` - **警告**:操作前备份Zookeeper数据(`zookeeper/bin/zkCli.sh -server localhost:2181` 备份)。误删可能导致数据不一致。 - 重启RegionServer触发节点重建。 - **确保会话一致性**: - 在HBase配置中添加 `hbase.zookeeper.useMulti`(设为 `true`),以启用原子操作减少节点冲突。 ```xml <property> <name>hbase.zookeeper.useMulti</name> <value>true</value> </property> ``` - 定期运行 `zkCleanup.sh` 清理Zookeeper日志和快照,防止磁盘满导致操作失败。 #### 4. **解决HRegionServer退出报错的方法** RegionServer退出可能是上述问题的连锁反应,或独立原因如安全配置错误。 - **检查安全配置**:如果集群曾启用Kerberos,确保完全禁用以避免残留影响(如引用[2]所述)。 - 在HBase配置文件(`hbase-site.xml`)中,验证Kerberos相关属性已移除或禁用: ```xml <property> <name>hbase.security.authentication</name> <value>simple</value> <!-- 禁用Kerberos --> </property> <property> <name>hbase.rpc.protection</name> <value>authentication</value> <!-- 设置为最低安全级别 --> </property> ``` - 重启HBase服务,并检查日志中无 `security block load` 错误[^2]。 - **处理资源瓶颈**: - 如果日志显示 `OutOfMemoryError`,增加RegionServer堆大小(例如 `-Xmx8G`)。 - 优化HBase表设计:避免热点Region(split策略调整)或启用压缩减少内存压力。 - **故障恢复机制**: - 使用HBase的内置工具 `hbase hbck` 检查并修复元数据不一致。 - 配置RegionServer自动重启策略(在系统级如systemd中添加重启规则)。 #### 5. **验证和预防措施** - **测试解决方案**: - 应用配置更改后,模拟网络中断(如 `iptables` 临时阻塞端口)测试会话恢复。 - 监控工具如HBase Web UI(端口16030)观察RegionServer状态。 - **预防建议**: - 定期维护:每月运行 `zkCleanup.sh` 和HBase压缩。 - 监控告警:集成Zabbix或Cloudera Manager,设置会话超时警报(阈值>90秒)。 - 文档参考:Apache HBase官方文档对Zookeeper超时问题有详细指导[^1],推荐定期查阅。 #### 总结 这些问题通常源于Zookeeper会话管理不当(如默认超时设置过低)或网络/资源问题。优先调整 `zookeeper.session.timeout` 和 `maxSessionTimeout`,并排查Kerberos残留配置(如果Kerberos曾启用)。实施后,集群稳定性应显著提升。如果问题持续,请提供详细日志以进一步诊断。
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值