Hive 0.13.1 和HBase 0.98.6.1整合

本文详细介绍了如何安装配置Hive并实现与HBase的数据交互。从Hive的安装步骤开始,逐步介绍如何配置Hive连接MySQL作为元数据存储,并最终实现Hive与HBase的整合配置及数据读取验证。

A:安装hadoop和HBase

参考:http://blog.youkuaiyun.com/wind520/article/details/39856353

B:安装Hive

 1:下载:wget http://mirrors.hust.edu.cn/apache/hive/stable/apache-hive-0.13.1-bin.tar.gz

 2:解压:[jifeng@feng02 ~]$ tar zxf apache-hive-0.13.1-bin.tar.gz 

 3:改目录:[jifeng@feng02 ~]$ mv apache-hive-0.13.1-bin hive

 4:配置

修改conf目录下的文件

[jifeng@feng02 ~]$ cd hive
[jifeng@feng02 hive]$ ls
bin  conf  examples  hcatalog  lib  LICENSE  NOTICE  README.txt  RELEASE_NOTES.txt  scripts
[jifeng@feng02 hive]$ cd conf
[jifeng@feng02 conf]$ ls
hive-default.xml.template  hive-exec-log4j.properties.template
hive-env.sh.template       hive-log4j.properties.template
[jifeng@feng02 conf]$ cp hive-env.sh.template  hive-env.sh
[jifeng@feng02 conf]$ cp hive-default.xml.template  hive-site.xml  
[jifeng@feng02 conf]$ ls
hive-default.xml.template  hive-env.sh.template                 hive-log4j.properties.template
hive-env.sh                hive-exec-log4j.properties.template  hive-site.xml
[jifeng@feng02 conf]$ 
修改bin目录下的文件hive-config.sh

[jifeng@feng02 bin]$ vi hive-config.sh   

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

#
# processes --config option from command line
#

this="$0"
while [ -h "$this" ]; do
  ls=`ls -ld "$this"`
  link=`expr "$ls" : '.*-> \(.*\)$'`
  if expr "$link" : '.*/.*' > /dev/null; then
    this="$link"
  else
    this=`dirname "$this"`/"$link"
  fi
done

# convert relative path to absolute path
bin=`dirname "$this"`
script=`basename "$this"`
bin=`cd "$bin"; pwd`
this="$bin/$script"

# the root of the Hive installation
if [[ -z $HIVE_HOME ]] ; then
  export HIVE_HOME=`dirname "$bin"`
fi

#check to see if the conf dir is given as an optional argument
while [ $# -gt 0 ]; do    # Until you run out of parameters . . .
  case "$1" in
    --config)
        shift
        confdir=$1
        shift
        HIVE_CONF_DIR=$confdir
        ;;
    --auxpath)
        shift
        HIVE_AUX_JARS_PATH=$1
        shift
        ;;
    *)
        break;
        ;;
  esac
done


# Allow alternate conf dir location.
HIVE_CONF_DIR="${HIVE_CONF_DIR:-$HIVE_HOME/conf}"

export HIVE_CONF_DIR=$HIVE_CONF_DIR
export HIVE_AUX_JARS_PATH=$HIVE_AUX_JARS_PATH

# Default to use 256MB
export HADOOP_HEAPSIZE=${HADOOP_HEAPSIZE:-256}
export JAVA_HOME=$HOME/jdk1.7.0_45
export HIVE_HOME=$HOME/hive
export HADOOP_HOME=$HOME/hadoop/hadoop-2.4.1
"hive-config.sh" 73L, 2011C 已写入   

最后新加三行
export JAVA_HOME=$HOME/jdk1.7.0_45  
export HIVE_HOME=$HOME/hive 
export HADOOP_HOME=$HOME/hadoop/hadoop-2.4.1  

配置mysql,修改$HIVE_HOME/conf/hive-site.xml 

<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:mysql://jifengsql:3306/hive?createDatabaseIfNotExist=true</value>
  <description>JDBC connect string for a JDBC metastore</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
  <description>Driver class name for a JDBC metastore</description>
</property>

<property>
  <name>javax.jdo.PersistenceManagerFactoryClass</name>
  <value>org.datanucleus.api.jdo.JDOPersistenceManagerFactory</value>
  <description>class implementing the jdo persistence</description>
</property>

<property>
  <name>javax.jdo.option.DetachAllOnCommit</name>
  <value>true</value>
  <description>detaches all objects from session so that they can be used after transaction is committed</
description>
</property>

<property>
  <name>javax.jdo.option.NonTransactionalRead</name>
  <value>true</value>
  <description>reads outside of transactions</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>dss</value>
  <description>username to use against metastore database</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>jifeng</value>
  <description>password to use against metastore database</description>
</property>

下载 wget http://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.32.tar.gz  

copy mysql-connector-java-5.1.32-bin.jar到$HIVE_HOME/lib

5:启动Hive

[jifeng@feng02 hive]$ bin/hive
Logging initialized using configuration in jar:file:/home/jifeng/hive/lib/hive-common-0.13.1.jar!/hive-log4j.properties
hive> show tables;  
OK
Time taken: 0.723 seconds
hive> 

C: 整合配置

首先要确保<HIVE_HOME>/lib 下HBase的jar包的版本要和实际环境中HBase的版本一致,需要用<HBASE_HOME>/lib/目录下得jar包:

[jifeng@feng02 lib]$ find -name "htr*jar" 
./htrace-core-2.04.jar
[jifeng@feng02 lib]$ find -name "hbase*jar"   
./hbase-server-0.98.6.1-hadoop2.jar
./hbase-client-0.98.6.1-hadoop2.jar
./hbase-it-0.98.6.1-hadoop2-tests.jar
./hbase-common-0.98.6.1-hadoop2.jar
./hbase-it-0.98.6.1-hadoop2.jar
./hbase-common-0.98.6.1-hadoop2-tests.jar
./hbase-protocol-0.98.6.1-hadoop2.jar
copy这些文件到 /home/jifeng/hive/lib目录

D:测试验证

测试前先依次启动Hadoop、Hbase

参考:https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration#HBaseIntegration-HiveHBaseIntegration
启动hive

命令:bin/hive --auxpath ./lib/hive-hbase-handler-0.13.1.jar,./lib/hbase-server-0.98.6.1-hadoop2.jar,./lib/zookeeper-3.4.5.jar,./lib/guava-11.0.2.jar --hiveconf hbase.master=feng01:60000

在Hive中创建HBase关联的表
CREATE TABLE hbase_table_1(key int, value string) 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
TBLPROPERTIES ("hbase.table.name" = "xyz1");

在Hive shell执行过程:

[jifeng@feng02 hive]$ bin/hive --auxpath ./lib/hive-hbase-handler-0.13.1.jar,./lib/hbase-server-0.98.6.1-hadoop2.jar,./lib/zookeeper-3.4.5.jar,./lib/guava-11.0.2.jar --hiveconf hbase.master=feng01:60000
14/10/08 15:59:20 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect.  Use hive.hmshandler.retry.* instead
Logging initialized using configuration in jar:file:/home/jifeng/hive/lib/hive-common-0.13.1.jar!/hive-log4j.properties
hive> CREATE TABLE hbase_table_1(key int, value string) 
    > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
    > WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
    > TBLPROPERTIES ("hbase.table.name" = "xyz1");
OK
Time taken: 2.606 seconds
hive> desc hbase_table_1;
OK
key                     int                     from deserializer   
value                   string                  from deserializer   
Time taken: 0.269 seconds, Fetched: 2 row(s)
在hbase shell中查询:

hbase(main):004:0> list
TABLE                                                                                                             
xyz                                                                                                               
xyz1                                                                                                              
2 row(s) in 0.0260 seconds

=> ["xyz", "xyz1"]
hbase(main):005:0> desc "xyz1"
DESCRIPTION                                                               ENABLED                                 
 'xyz1', {NAME => 'cf1', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'R true                                    
 OW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', M                                         
 IN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'false', BLO                                         
 CKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}                                                   
1 row(s) in 1.1600 seconds

hbase(main):006:0> scan 'xyz1'
ROW                           COLUMN+CELL                                                                         
0 row(s) in 0.0510 seconds
在hbase中插入数据:

hbase(main):007:0> put 'xyz1','99','cf1:val','test.micmiu.com'
0 row(s) in 0.0770 seconds

hbase(main):008:0> scan 'xyz1'
ROW                           COLUMN+CELL                                                                         
 99                           column=cf1:val, timestamp=1412756927628, value=test.micmiu.com                      
1 row(s) in 0.0160 seconds
在hive中查询

hive> select * from hbase_table_1;                                      
OK
99      test.micmiu.com
Time taken: 0.13 seconds, Fetched: 1 row(s)
hive> 




<think>好的,我现在需要解决用户在创建HiveHBase的外部表时遇到的SemanticException错误,具体是“Unable toinstantiateSessionHiveMetaStoreClient”。首先,我需要分析这个错误的原因。SessionHiveMetaStoreClient无法实例化通常与Hive Metastore服务有关,可能包括配置错误、依赖缺失或服务未启动等。用户之前的问题是关于Hive+HBase的HQL操作示例,现在转向了创建外部表时的错误。这可能是在配置HiveHBase整合时出现的。需要回顾之前的步骤,比如是否正确添加了必要的JAR文件,配置了hive-site.xml,以及HiveHBase的版本兼容性。 接下来,我需要考虑可能的解决方案。首先检查Hive Metastore服务是否运行。如果用户没有正确配置或启动Metastore服务,Hive无法访问元数据,导致此错误。其次,检查HiveHBase的依赖库是否正确放置,特别是hive-hbase-handler相关HBase客户端JAR是否在Hive的lib目录下。另外,版本兼容性也很重要,比如Hive3.x与HBase2.x可能存在兼容性问题,需要确认版本是否匹配。 还要查看Hive的日志文件,通常位于/tmp/hive.log或配置的日志路径,查找更详细的错误信息。例如,是否有ClassNotFoundException,这可能表明缺少某个JAR文件。用户可能遗漏了某些必要的依赖项,或者路径配置不正确。 此外,检查hive-site.xml中的配置是否正确,特别是hive.aux.jars.path是否指向正确的hive-hbase-handler.jar路径。如果路径错误,Hive无法加载所需的处理程序,导致实例化失败。另外,HiveMetastore的数据库连接配置是否正确?如果使用的是远程Metastore数据库,需要确保数据库服务可访问,权限正确,并且Hive的配置文件中包含正确的JDBC连接信息。最后,用户可能需要确认HBase的服务是否正常运行,因为如果HBase集群不可用,Hive在尝试连接时也可能出现错误。可以通过hbase shell命令测试HBase的连接状态。 总结可能的解决步骤:检查Metastore服务状态、验证依赖库配置、检查版本兼容性、查看日志获取详细信息、确保HBase服务可用。需要逐步引导用户排查这些方面,并提供具体的命令配置示例来解决问题。</think>### HiveHBase整合时出现`SemanticException: Unable to instantiate SessionHiveMetaStoreClient`的解决方案 --- #### 一、问题定位 该错误通常与**Hive Metastore服务配置**或**依赖库冲突**相关。当Hive尝试创建HBase映射表时,需要与Metastore通信获取元数据,若服务未启动、配置错误或缺少关键依赖,则触发此异常[^1]。以下是分步排查方法: --- #### 二、关键解决步骤 ##### 1. **检查Metastore服务状态** - **本地模式**: 若使用内置Derby数据库,确保`hive-site.xml`中无远程Metastore配置: ```xml <property> <name>hive.metastore.uris</name> <value></value> <!-- 必须为空 --> </property> ``` - **远程模式**: 验证Metastore服务是否运行: ```bash # 启动服务(需先初始化元数据库) hive --service metastore & ``` 若出现`Address already in use`错误,需检查端口冲突(默认9083)[^2]。 ##### 2. **验证依赖库配置** - **必要JAR文件**: 确保以下JAR存在于`$HIVE_HOME/lib`目录: - `hive-hbase-handler-*.jar`(Hive自带) - `hbase-client-*.jar` - `hbase-common-*.jar` - `zookeeper-*.jar` 通过以下命令检查HBase版本兼容性: ```bash hbase version | grep "HBase" hive --version | grep "Hive" ``` 推荐版本组合:Hive 3.x + HBase 2.x[^3]。 ##### 3. **配置`hive-site.xml`** 添加HBase集成参数: ```xml <!-- 指定HBase依赖路径 --> <property> <name>hive.aux.jars.path</name> <value>file:///path/to/hive-hbase-handler.jar,file:///path/to/hbase-client.jar</value> </property> <!-- 禁用Hive元数据缓存 --> <property> <name>hive.metastore.schema.verification</name> <value>false</value> </property> ``` ##### 4. **修复权限问题** - 确保HiveHBase目录有读写权限: ```bash hdfs dfs -chmod -R 755 /hbase ``` - 检查Hive用户是否有HBase表的访问权限: ```bash hbase shell grant 'hive_user', 'RWCA', 'hbase_table' ``` ##### 5. **日志分析** 查看Hive执行日志定位具体错误: ```bash tail -f /tmp/$USER/hive.log | grep -C 20 'SessionHiveMetaStoreClient' ``` 常见错误模式: - **ClassNotFoundException**:缺少JAR依赖 ``` java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration ``` 需将`hbase-common-*.jar`加入`HIVE_AUX_JARS_PATH`[^4] - **Connection refused**:Metastore服务未启动 --- #### 三、完整操作示例 假设HBase表`hbase_table`已存在,创建Hive映射表: ```sql -- 创建映射表(需先修复依赖问题) CREATE EXTERNAL TABLE hive_hbase_mapping ( rowkey STRING, info STRING ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( "hbase.columns.mapping" = ":key,cf1:info" ) TBLPROPERTIES ( "hbase.table.name" = "hbase_table" ); -- 验证数据同步 SELECT * FROM hive_hbase_mapping LIMIT 10; ``` --- #### 四、数学关系解释 当HiveHBase元数据同步时,需满足以下约束条件: $$ \exists f: \text{HiveSchema} \rightarrow \text{HBaseSchema} $$ 其中$f$为列映射函数,需满足: $$ \forall c \in \text{HiveColumns}, \exists! (cf:col) \in \text{HBaseColumns} \text{ 使得 } f(c) = (cf:col) $$ 若映射不满足单射性,则会导致Schema解析失败[^5]。 ---
评论 1
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值