4. Hbase 数据多版本与数据删除

本文深入探讨HBase中数据的多版本存储特性,通过实例演示如何设定最大版本数,以及如何查看不同版本的数据。同时,文章详细解析了HBase的删除操作原理,解释了墓碑标记的概念及其实现方式。

数据多版本

HBASE 中通过rowkey和column确定的为一个存储单元称为cell。每个cell都可以保存同一个数据的多个版本。

下面通过一个例子来说明Hbase的多版本。

  1. 创建一张表

    # 指定info 列簇中数据的最大版本为3,默认为1
    hbase(main):012:0>  create 'stu', {NAME => 'info', VERSIONS => 3}
    0 row(s) in 1.2550 seconds
    
    => Hbase::Table - stu
    
  2. 插入数据

    hbase(main):013:0> put 'stu', '0001', 'info:name', "name1"
    0 row(s) in 0.0720 seconds
    
    hbase(main):014:0> put 'stu', '0001', 'info:name', "name2"
    0 row(s) in 0.0090 seconds
    
    hbase(main):015:0> put 'stu', '0001', 'info:name', "name3"
    0 row(s) in 0.0050 seconds
    
  3. 查看数据

    hbase(main):016:0> scan 'stu'
    ROW                           COLUMN+CELL                                                                       
     0001                         column=info:name, timestamp=1573921647260, value=name3                            
    1 row(s) in 0.0380 seconds
    

    奇怪,为什么只能查看name3这条记录,name1, name2 去哪里了呢?原来scan扫描表时默认只扫描每条记录的最新版本。

    hbase(main):017:0> scan 'stu', {VERSIONS=>3}
    ROW                           COLUMN+CELL                                                                       
     0001                         column=info:name, timestamp=1573921647260, value=name3                            
     0001                         column=info:name, timestamp=1573921644577, value=name2                            
     0001                         column=info:name, timestamp=1573921639958, value=name1                            
    1 row(s) in 0.0210 seconds
    

    当加上版本参数后,就可以查看到所有的数据了。当然get也是一样的使用方法。

     hbase(main):023:0> get 'stu', '0001', {COLUMN=>'info:name', VERSIONS=>3}
    COLUMN                        CELL                                                                              
     info:name                    timestamp=1573921647260, value=name3                                              
     info:name                    timestamp=1573921644577, value=name2                                              
     info:name                    timestamp=1573921639958, value=name1                                              
    3 row(s) in 0.0080 seconds
    

数据删除

Hbase不存储数据,它的所有数据都存储在Hdfs上。Hdfs文件系统不支持对文件的随机读写,那么Hbase对数据的删除操作是怎么实现的呢?
下面通过例子一探究竟。

  1. 还是使用stu表,插入数据。

    # hbase 插入数据时可以指定时间戳
    hbase(main):011:0> put 'stu', '0001', 'info:address', "address1", 1
    0 row(s) in 0.0120 seconds
    
    hbase(main):012:0> put 'stu', '0001', 'info:address', "address1", 2
    0 row(s) in 0.0070 seconds
    
    hbase(main):013:0> put 'stu', '0001', 'info:address', "address1", 3
    0 row(s) in 0.0100 seconds
    
  2. 查看所有的数据

    hbase(main):016:0>  scan 'stu', {VERSIONS=>3}
    ROW                           COLUMN+CELL                                                                       
     0001                         column=info:address, timestamp=3, value=address1                                  
     0001                         column=info:address, timestamp=2, value=address1                                  
     0001                         column=info:address, timestamp=1, value=address1                                  
    1 row(s) in 0.0140 seconds
    
  3. 删除版本2的数据,再次查看所有的数据

    hbase(main):017:0> delete 'stu', '0001', 'info:address', 2
    0 row(s) in 0.0270 seconds
    
    hbase(main):018:0>  scan 'stu', {VERSIONS=>3}
    ROW                           COLUMN+CELL                                                                       
    0001                         column=info:address, timestamp=3, value=address1  
    

    奇怪,此时只能查看到timestamp=3的数据,timestamp=1和timestamp=2的数据都消失了,为什么呢?
    Hbase中删除记录并不是真的删除了数据,而是放置了一个 墓碑标记(tombstone marker),因为Hdfs不支持对文件的随机读写。被打上墓碑标记的记录在HFile合并时才会被真正的删除。

  4. 查看被打上墓碑标记的数据
    只要记录未被真正的删除还是可以被查看的。

    hbase(main):019:0> scan 'stu', {RAW=>true, VERSIONS=>3}
    ROW                           COLUMN+CELL                                                                       
     0001                         column=info:address, timestamp=3, value=address1                                  
     0001                         column=info:address, timestamp=2, type=DeleteColumn                               
     0001                         column=info:address, timestamp=2, value=address1                                  
     0001                         column=info:address, timestamp=1, value=address1                                  
    1 row(s) in 0.0140 seconds
    

    从查询结果也可以看出删除记录时其实是插入了一条时间戳相同的墓碑标记。

se.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3369) at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3346) at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1439) at org.apache.hadoop.hbase.regionserver.RSRpcServices.newRegionScanner(RSRpcServices.java:3036) at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3369) at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42278) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318) 2025-06-05T07:15:37.522Z, RpcRetryingCaller{globalStartTime=1749107733347, pause=100, maxAttempts=8}, org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: hbase:meta,,1 is not online on slave2,16020,1749107569831 at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3369) at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3346) at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1439) at org.apache.hadoop.hbase.regionserver.RSRpcServices.newRegionScanner(RSRpcServices.java:3036) at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3369) at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42278) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318) 2025-06-05T07:15:41.563Z, RpcRetryingCaller{globalStartTime=1749107733347, pause=100, maxAttempts=8}, org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: hbase:meta,,1 is not online on slave2,16020,1749107569831 at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3369) at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3346) at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1439) at org.apache.hadoop.hbase.regionserver.RSRpcServices.newRegionScanner(RSRpcServices.java:3036) at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3369) at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42278) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318) at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:145) at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: hbase:meta,,1 is not online on slave2,16020,1749107569831 at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3369) at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3346) at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1439) at org.apache.hadoop.hbase.regionserver.RSRpcServices.newRegionScanner(RSRpcServices.java:3036) at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3369) at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42278) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318) at sun.reflect.GeneratedConstructorAccessor12.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.instantiateException(RemoteWithExtrasException.java:99) at org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.unwrapRemoteException(RemoteWithExtrasException.java:89) at org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.makeIOExceptionOfException(ProtobufUtil.java:364) at org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.handleRemoteException(ProtobufUtil.java:352) at org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:344) at org.apache.hadoop.hbase.client.ScannerCallable.rpcCall(ScannerCallable.java:242) at org.apache.hadoop.hbase.client.ScannerCallable.rpcCall(ScannerCallable.java:58) at org.apache.hadoop.hbase.client.RegionServerCallable.call(RegionServerCallable.java:127) at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:192) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:396) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:370) at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:107) ... 4 more Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.NotServingRegionException): org.apache.hadoop.hbase.NotServingRegionException: hbase:meta,,1 is not online on slave2,16020,1749107569831 at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3369) at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3346) at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1439) at org.apache.hadoop.hbase.regionserver.RSRpcServices.newRegionScanner(RSRpcServices.java:3036) at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3369) at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42278) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:389) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:97) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:423) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:419) at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:117) at org.apache.hadoop.hbase.ipc.Call.setException(Call.java:132) at org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.readResponse(NettyRpcDuplexHandler.java:162) at org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.channelRead(NettyRpcDuplexHandler.java:192) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:359) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:345) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:337) at org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323) at org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:297) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:359) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:345) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:337) at org.apache.hbase.thirdparty.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:359) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:345) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:337) at org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1408) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:359) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:345) at org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:930) at org.apache.hbase.thirdparty.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) at org.apache.hbase.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:677) at org.apache.hbase.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:612) at org.apache.hbase.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:529) at org.apache.hbase.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:491) at org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:905) at org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ... 1 more
最新发布
06-06
### HBase NotServingRegionException hbase:meta is not online on slave2解决方案 HBase中的`NotServingRegionException`异常通常表示某个RegionServer无法提供特定Region的服务。在这种情况下,`hbase:meta`表未在线,导致客户端无法访问元数据信息[^1]。以下是可能的解决方案及其详细说明: #### 1. 检查HBase配置文件 确保`hbase-site.xml`配置文件中没有错误。如果配置错误可能导致`hbase:meta`表无法正常加载。例如,检查以下关键参数是否正确设置: - `hbase.rootdir`: 确保指向正确的HDFS路径。 - `hbase.zookeeper.quorum`: 确保ZooKeeper集群地址正确。 可以通过以下命令验证配置文件是否正确加载: ```bash hbase shell status 'detailed' ``` 如果配置文件有问题,请修正后重启HBase集群[^3]。 #### 2. 检查RegionServer状态 通过HBase Master Web UI(如`http://master:16010`)或日志文件确认`slave2`上的RegionServer是否正常运行。如果`slave2`上的RegionServer已崩溃或未启动,需要手动重启该RegionServer: ```bash stop-regionserver.sh slave2 start-regionserver.sh slave2 ``` 同时,检查`hbase-root-regionserver-slave2.log`日志文件,寻找`hbase:meta`相关的错误信息[^4]。 #### 3. 手动分配`hbase:meta` Region 如果`hbase:meta` Region未在线,可以通过HBase Shell手动分配该Region: ```bash hbase shell assign 'hbase:meta,,1' ``` 如果分配失败,可能需要强制重新分配: ```bash unassign 'hbase:meta,,1' true assign 'hbase:meta,,1' ``` 注意:执行上述操作前,建议先停止所有客户端对HBase的访问,以避免数据不一致问题[^4]。 #### 4. 检查Hadoop版本兼容性 如果HBase和Hadoop版本不匹配,可能会导致类似的问题。确保HBase lib目录下的Hadoop jar文件分布式Hadoop版本一致。如果存在版本冲突,替换为正确的Hadoop jar文件,并重启整个HBase集群[^2]。 #### 5. 调整超时配置 如果问题是由于超时引起,可以在`hbase-site.xml`中增加以下配置项: ```xml <property> <name>hbase.client.operation.timeout</name> <value>60000</value> </property> <property> <name>hbase.client.scanner.timeout.period</name> <value>60000</value> </property> ``` 调整完成后,重启HBase集群以使更改生效[^5]。 #### 6. 清理HDFS中的残留数据 如果`hbase:meta`表已被删除但元数据仍然存在,可能会导致冲突。可以尝试清理HDFS中残留的`hbase:meta`数据: ```bash hdfs dfs -rm -r /hbase/meta ``` 然后重启HBase集群,让系统重新生成`hbase:meta`表。 --- ### 示例代码 以下是一个简单的脚本,用于检查并重新分配`hbase:meta` Region: ```python from subprocess import Popen, PIPE def reassign_meta(): commands = [ "hbase shell <<EOF", "disable 'hbase:meta'", "drop 'hbase:meta'", "create 'hbase:meta', 'info'", "EOF" ] process = Popen(" && ".join(commands), stdout=PIPE, stderr=PIPE, shell=True) output, error = process.communicate() if process.returncode != 0: print(f"Error: {error.decode('utf-8')}") else: print(f"Success: {output.decode('utf-8')}") reassign_meta() ``` ---
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值