HBase与Hive整合及高级应用与优化-优快云博客

本文详细介绍了如何整合HBase与Hive，包括解决整合过程中的版本错误问题，创建外部表，映射多列的注意事项。同时探讨了HBase的高级应用，如协处理器和二级索引的实现。最后，讨论了HBase的优化策略，包括客户端优化和配置参数调整。

hbase与hive的整合

数据存储、查询数据分析
整合的目的：
hbase中表的数据在hive中能够查询到
hive中表的数据在hbase中能够查询到
整合的步骤：
1、在hive中创建hbase能看到的表

create table if not exists hbase2hive(
uid int,
uname string,
age int
)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
with serdeproperties(
"hbase.columns.mapping"=":key,f1:name,f1:age"
)
tblproperties("hbase.table.name"="hh1")
;

出现错误：
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V – 版本错误

解决方案：
将hive-hbase-handler的源码包重新打包，然后，将重新打的包以及依赖包都上传到$HIVE_HOME/lib目录中，再重启hive

创建一个临时表并导入数据

加载数据：

insert into table hbase2hive
select * from stu_score
;

2、hbase中已经存在表，并且存在数据

create external table if not exists hbase_user_info1(
uid string,
uname string,
uage int
)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
with serdeproperties(
"hbase.columns.mapping"=":key,base_info:name,base_info:age"
)
tblproperties("hbase.table.name"="ns1:t_user_info")
;

映射多列：

注意事项：
1、映射hbase中多列，要么写:key，要么不写，因为默认使用:key来匹配第一个字段
2、hbase中表存在的时候，在hive中要创建对应的表时需要加关键字external
3、如果删除hbase中的表时，在hive中不能查询出数据
4、hbase中的列和hive中的列的个数和类型最好要一致（hive和hbase的表中字段的陪陪关系不是按照字段名来匹配的，而是按照顺序来匹配）
5、hive和hbase、mysql等可以使用第三方的工具来相互整合数据，比如蓝灯、shell脚本

hbase的高级应用

1、协处理器（Coprocessor）
反向索引的需求
2、二级索引

继承BaseRegionObserver类
实现prePut或者postPut方法
create ‘t_guanzhu’,‘cf1’,‘cf2’
create ‘t_fensi’,‘cf1’

将jar包上传到hdfs之上
hdfs dfs -mkdir /hbaseObServer
hdfs dfs -put gp1813Demo-1.0-SNAPSHOT.jar /hbaseObServer

将协处理器注册到表上
alter ‘t_guanzhu’,METHOD => ‘table_att’,‘coprocessor’=>‘hdfs://qianfeng/hbaseObServer/gp1813Demo-1.0-SNAPSHOT.jar|com.qfedu.bigdata.hbaseObServer.InverIndexCoprocessor|1001|’

hbase的优化

hbase需要注意的事项：
属性设置：
memstore 的刷新阀值：
hbase.hregion.memstore.flush.size=134217728 128M
region切分的阀值：
hbase.hregion.max.filesize=10737418240 10G
regionserver的操作线程数：
hbase.regionserver.handler.count=30

hbase的优化

客户端的优化：
1、关闭自动刷新：
htable.setAutoFlush(true/false)
2、尽量批量写入(put、delete)
3、谨慎关闭Hlog：
ht.setDurability(Durability.SKIP_WAL);
4、尽量把数据放到缓存中
hc1.setInMemory(true);
5、尽量不要太多的列簇，最多两个。（因为hbase在刷新数据的时候会把相邻的列簇也刷新）
6、rowkey的长度尽量短。最大64K
7、尽量将该关闭的对象关闭。admin、table、resultScanner。