Hbase 踩坑

HBase Major Compaction 日志解析
本文记录了HBase在特定时间进行Major Compaction的过程,包括多个region的压缩活动详情,如文件数量、大小及压缩后的效果。此外,还展示了压缩过程中使用的临时目录、缓存配置以及压缩算法等关键信息。
2018-02-11 15:50:27,843 INFO  [RpcServer.FifoWFPBQ.priority.handler=19,queue=1,port=16020] regionserver.RSRpcServices: Compacting archiveLogData,D,1517906411842.c5edfc6575a591b2b5eb06b9e069bd48.
2018-02-11 15:50:27,853 INFO  [RpcServer.FifoWFPBQ.priority.handler=19,queue=1,port=16020] regionserver.RSRpcServices: Compacting archiveLogData,E,1517906411842.266f3403ecdf3f9e7e50bb81f3b4e15c.
2018-02-11 15:50:27,853 INFO  [regionserver/master/10.1.69.11:16020-shortCompactions-1517386971937] regionserver.HRegion: Starting compaction on result in region archiveLogData,E,1517906411842.266f3403ecdf3f9e7e50bb81f3b4e15c.
2018-02-11 15:50:27,854 INFO  [regionserver/master/10.1.69.11:16020-shortCompactions-1517386971937] regionserver.HStore: Starting compaction of 2 file(s) in result of archiveLogData,E,1517906411842.266f3403ecdf3f9e7e50bb81f3b4e15c. into tmpdir=hdfs://master:8020/apps/hbase/data/data/default/archiveLogData/266f3403ecdf3f9e7e50bb81f3b4e15c/.tmp, totalSize=2.2 M
2018-02-11 15:50:27,858 INFO  [RpcServer.FifoWFPBQ.priority.handler=19,queue=1,port=16020] regionserver.RSRpcServices: Compacting archiveLogData,I,1517906411842.dae8cbe26db0f7c9eee4ef538b84114e.
2018-02-11 15:50:27,863 INFO  [RpcServer.FifoWFPBQ.priority.handler=19,queue=1,port=16020] regionserver.RSRpcServices: Compacting archiveLogData,M,1517906411842.816d2f840c296bb2670179297dd8930b.
2018-02-11 15:50:27,865 INFO  [RpcServer.FifoWFPBQ.priority.handler=19,queue=1,port=16020] regionserver.RSRpcServices: Compacting archiveLogData,O,1517906411842.44373bc9f52c764a167a4366d13fc008.
2018-02-11 15:50:27,870 INFO  [RpcServer.FifoWFPBQ.priority.handler=19,queue=1,port=16020] regionserver.RSRpcServices: Compacting archiveLogData,R,1517906411842.1d044faf08659ff7a84077b094d75168.
2018-02-11 15:50:27,871 INFO  [RpcServer.FifoWFPBQ.priority.handler=19,queue=1,port=16020] regionserver.RSRpcServices: Compacting archiveLogData,U,1517906411842.d6c4ff2720459b962ffc97888c28cb67.
2018-02-11 15:50:27,871 INFO  [regionserver/master/10.1.69.11:16020-shortCompactions-1517386971937] hfile.CacheConfig: blockCache=LruBlockCache{blockCount=4, currentSize=896080, freeSize=841005616, maxSize=841901696, heapSize=896080, minSize=799806592, minFactor=0.95, multiSize=399903296, multiFactor=0.5, singleSize=199951648, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false
2018-02-11 15:50:27,878 INFO  [regionserver/master/10.1.69.11:16020-shortCompactions-1517386971937] compress.CodecPool: Got brand-new compressor [.snappy]
2018-02-11 15:50:27,974 INFO  [regionserver/master/10.1.69.11:16020-shortCompactions-1517386971937] compress.CodecPool: Got brand-new decompressor [.snappy]
2018-02-11 15:50:27,998 INFO  [regionserver/master/10.1.69.11:16020-shortCompactions-1517386971937] regionserver.HStore: Completed major compaction of 2 (all) file(s) in result of archiveLogData,E,1517906411842.266f3403ecdf3f9e7e50bb81f3b4e15c. into b03f605b75e44cda98fe8aa58fe05d4f(size=418.6 K), total size for store is 418.6 K. This selection was in queue for 0sec, and took 0sec to execute.
2018-02-11 15:50:27,998 INFO  [regionserver/master/10.1.69.11:16020-shortCompactions-1517386971937] regionserver.CompactSplitThread: Completed compaction: Request = regionName=archiveLogData,E,1517906411842.266f3403ecdf3f9e7e50bb81f3b4e15c., storeName=result, fileCount=2, fileSize=2.2 M, priority=1, time=11832333055519425; duration=0sec
2018-02-11 15:50:27,998 INFO  [regionserver/master/10.1.69.11:16020-shortCompactions-1517386971937] regionserver.HRegion: Starting compaction on result in region archiveLogData,U,1517906411842.d6c4ff2720459b962ffc97888c28cb67.
2018-02-11 15:50:27,999 INFO  [regionserver/master/10.1.69.11:16020-shortCompactions-1517386971937] regionserver.HStore: Starting compaction of 2 file(s) in result of archiveLogData,U,1517906411842.d6c4ff2720459b962ffc97888c28cb67. into tmpdir=hdfs://master:8020/apps/hbase/data/data/default/archiveLogData/d6c4ff2720459b962ffc97888c28cb67/.tmp, totalSize=4.2 M
2018-02-11 15:50:28,005 INFO  [regionserver/master/10.1.69.11:16020-shortCompactions-1517386971937] hfile.CacheConfig: blockCache=LruBlockCache{blockCount=4, currentSize=896080, freeSize=841005616, maxSize=841901696, heapSize=896080, minSize=799806592, minFactor=0.95, multiSize=399903296, multiFactor=0.5, singleSize=199951648, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false
2018-02-11 15:50:28,135 INFO  [regionserver/master/10.1.69.11:16020-shortCompactions-1517386971937] regionserver.HStore: Completed major compaction of 2 (all) file(s) in result of archiveLogData,U,1517906411842.d6c4ff2720459b962ffc97888c28cb67. into 018c53a55ee14144a483ef1c34142442(size=943.0 K), total size for store is 943.0 K. This selection was in queue for 0sec, and took 0sec to execute.
2018-02-11 15:50:28,135 INFO  [regionserver/master/10.1.69.11:16020-shortCompactions-1517386971937] regionserver.CompactSplitThread: Completed compaction: Request = regionName=archiveLogData,U,1517906411842.d6c4ff2720459b962ffc97888c28cb67., storeName=result, fileCount=2, fileSize=4.2 M, priority=1, time=11832333073118003; duration=0sec
2018-02-11 15:51:39,264 INFO  [regionserver/master/10.1.69.11:16020.leaseChecker] regionserver.RSRpcServices: Scanner 4157 lease expired on region archiveLogData,E,1517906411842.266f3403ecdf3f9e7e50bb81f3b4e15c.
2018-02-11 15:52:15,352 INFO  [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=1.17 MB, freeSize=801.73 MB, max=802.90 MB, blockCount=9, accesses=14039, hits=13590, hitRatio=96.80%, , cachingAccesses=13598, cachingHits=13560, cachingHitsRatio=99.72%, evictions=94889, evicted=29, evictedPerRun=3.0562025494873524E-4
2018-02-11 15:52:46,915 INFO  [HBase-Metrics2-1] impl.MetricsSystemImpl: Stopping HBase metrics system...
2018-02-11 15:52:46,916 INFO  [timeline] impl.MetricsSinkAdapter: timeline thread interrupted.
2018-02-11 15:52:46,918 INFO  [HBase-Metrics2-1] impl.MetricsSystemImpl: HBase metrics system stopped.
2018-02-11 15:52:47,419 INFO  [HBase-Metrics2-1] impl.MetricsConfig: loaded properties from hadoop-metrics2-hbase.properties
2018-02-11 15:52:47,539 INFO  [HBase-Metrics2-1] timeline.HadoopTimelineMetricsSink: Initializing Timeline metrics sink.
2018-02-11 15:52:47,539 INFO  [HBase-Metrics2-1] timeline.HadoopTimelineMetricsSink: Identified hostname = master, serviceName = hbase
2018-02-11 15:52:47,579 INFO  [HBase-Metrics2-1] availability.MetricSinkWriteShardHostnameHashingStrategy: Calculated collector shard slave2 based on hostname: master
2018-02-11 15:52:47,579 INFO  [HBase-Metrics2-1] timeline.HadoopTimelineMetricsSink: Collector Uri: http://slave2:6188/ws/v1/timeline/metrics
2018-02-11 15:52:47,579 INFO  [HBase-Metrics2-1] timeline.HadoopTimelineMetricsSink: Container Metrics Uri: http://slave2:6188/ws/v1/timeline/containermetrics
2018-02-11 15:52:47,583 INFO  [HBase-Metrics2-1] impl.MetricsSinkAdapter: Sink timeline started
2018-02-11 15:52:47,619 INFO  [HBase-Metrics2-1] impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2018-02-11 15:52:47,619 INFO  [HBase-Metrics2-1] impl.MetricsSystemImpl: HBase metrics system started
主动对table进行Major_Compact
<think> 嗯,用户这次询问的是HBase数据迁移的具体方法和注意事项。从用户引用的内容来看,ta应该是个技术人员,正在准备实际的迁移工作,需要全面了解迁移方案。 用户提供的三个引用很有价值:引用[1]提到了flush操作的关键步骤,引用[2]强调了HBase迁移的双重维度(HBase层和HDFS层),引用[3]则给出了distcp的具体命令示例。不过用户特别说明这些引用仅供参考,说明ta需要更系统化的解答。 上次对话中用户没有HBase相关记录,所以这次要完整构建知识框架。迁移方案需要覆盖四个主流方法:Export/Import适合中小数据量,Snapshot适合实时迁移,CopyTable适合增量迁移,distcp适合集群升级。特别是引用[3]提到的元数据不一致问题,在注意事项里必须重点强调flush和region对齐。 用户可能遇到的实际难点在于:如何选择合适工具?迁移时如何保证业务连续性?我的回复要把每个方案的适用场景明确标注出来,比如CopyTable支持时间范围这个实用功能。另外ta没提但很重要的网络带宽问题也要补充,这是实际迁移最容易的地方。 考虑到用户是技术人员,回复要结构化但避免啰嗦。把四个方案列成表格最直观,注意事项用分类清单。最后的相关问题设计成进阶方向:性能优化和跨版本迁移应该是ta接下来可能遇到的挑战。 </think> 以下是HBase数据迁移的完整方案及注意事项,综合引用信息和实践经验整理: --- ### **一、HBase数据迁移方法** #### 1. **Export/Import工具** - **适用场景**:中小数据量迁移、跨版本迁移 - **流程**: ```bash # 源集群导出 hbase org.apache.hadoop.hbase.mapreduce.Export <表名> <HDFS导出路径> # 目标集群导入 hbase org.apache.hadoop.hbase.mapreduce.Import <表名> <HDFS导出路径> ``` - **特点**: - 支持增量迁移(通过指定时间范围) - 依赖MapReduce,可能影响集群性能[^2]。 #### 2. **Snapshot快照迁移** - **适用场景**:实时迁移、最小化停机时间 - **流程**: ```bash # 1. 创建快照(需停止写入) hbase> snapshot 'test', 'test_snapshot' # 2. 导出快照到目标集群HDFS hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot \ -snapshot 'test_snapshot' \ -copy-to hdfs://新集群IP:8020/hbase \ -mappers 16 # 3. 目标集群恢复快照 hbase> restore_snapshot 'test_snapshot' ``` - **优势**: - 元数据与数据同步迁移 - 几乎不影响源集群读写(创建快照后即可恢复写入)[^1][^2]。 #### 3. **CopyTable工具** - **适用场景**:单表迁移、部分列迁移 - **命令**: ```bash hbase org.apache.hadoop.hbase.mapreduce.CopyTable \ --new.name=目标表名 \ --peer.adr=目标集群ZK地址:2181:/hbase \ 源表名 ``` - **特性**: - 支持时间范围过滤(`--starttime`/`--endtime`) - 可跨集群在线迁移[^2]。 #### 4. **HDFS层DistCp复制** - **适用场景**:全集群迁移、HDFS版本兼容 - **流程**: ```bash # 1. 停止HBase集群 # 2. 使用DistCp复制HBase数据目录 hadoop distcp \ -Dmapreduce.map.memory.mb=2048 \ hdfs://源集群IP:9000/hbase/test_table \ hdfs://目标集群IP:9000/hbase/test_table # 3. 修复.META.表(目标集群执行) hbase hbck -repairHoles ``` - **注意**: - 需严格保证Hadoop版本一致 - 迁移后需重启HBase加载元数据[^3]。 --- ### **二、关键注意事项** 1. **数据一致性保障** - 迁移前务必执行 `flush '表名'` 确保内存数据落盘[^1] - 实时迁移需停止写入或启用快照隔离 2. **元数据兼容性** - 跨版本迁移时,优先使用Export/Import或Snapshot - 检查HFile格式兼容性(例如HBase 1.x → 2.x需迁移工具转换) 3. **网络与性能优化** - DistCp/CopyTable建议增加 `-bandwidth` 限流避免压垮网络 - Snapshot迁移使用 `-mappers` 并行提升速度 4. **迁移后验证** - 对比行数:`hbase org.apache.hadoop.hbase.mapreduce.RowCounter '表名'` - 抽样校验关键数据完整性 > **各方案对比总结** > | 方法 | 停机时间 | 适用数据量 | 复杂度 | > |--------------|----------|------------|--------| > | Snapshot | 秒级 | 大数据 | ★★☆ | > | Export | 小时级 | 中小数据 | ★★★ | > | DistCp | 需停集群 | 超大集群 | ★★☆ | > | CopyTable | 无停机 | 单表 | ★☆☆ | --- ### **三、常见问题解决方案** - **Region不一致**: 执行 `hbase hbck -repair` 修复元数据[^3]。 - **跨版本HFile报错**: 在目标集群执行 `hbase org.apache.hadoop.hbase.util.HBaseFsck` 修复。 - **迁移中断**: DistCp支持 `-update` 增量同步,Snapshot可重复导出。 > 引用建议:历史表迁移可不停止写入,但对实时表必须停写+flush保证一致性[^1][^2]。 --- **
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值