参照CDH5官方文档 hive mysql 授权访问账号
<!-- metastore config end -->
配置hive-site.xml支持lock manager(这个是需要借助zookeeper集群)
<property>
<name>hive.support.concurrency</name>
<description>Enable Hive's Table Lock Manager Service</description>
<value>true</value>
</property>
<property>
<name>hive.zookeeper.quorum</name>
<description>Zookeeper quorum used by Hive's Table Lock Manager</description>
<value>zk1.myco.com,zk2.myco.com,zk3.myco.com</value>
</property>
<!--指定端口-->
<property>
<name>hive.zookeeper.client.port</name>
<value>2181</value>
</property>
<!--指定在zk集群中的znode名字-->
<property>
<name>hive.zookeeper.namespace</name>
<value>hive_zookeeper_namespace</value>
</property>
<!-- store directory in HDFS -->
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
---------------------------------------------------------
指定hive使用yarn
# vi /etc/default/hive-server2
添加如下
export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
启动metastore
# service hive-metastore start
创建HDFS存储目录
# sudo -u hdfs hadoop fs -mkdir -p /user/hive/warehouse
# sudo -u hdfs hadoop fs -chmod 1777 /user/hive/warehouse
t标志位表示允许用户创建使用自己的表但是防止删除不是自己所有的表
启动hive server2(启动前必须保证hive metastore先启动)
# service hive-server2 start
beeline测试连接
# /usr/lib/hive/bin/beeline
beeline> !connect jdbc:hive2://localhost:10000 org.apache.hive.jdbc.HiveDriver
0: jdbc:hive2://localhost:10000> show tables;
+-----------+
| tab_name |
+-----------+
+-----------+
No rows selected (1.596 seconds)
---------------------------------------------------------------------------------
shark启动错误
# /usr/lib/shark/bin/shark-withinfo -skipRddReload
-hiveconf hive.root.logger=INFO,console -skipRddReload
Starting the Shark Command Line Client
Exception in thread "main" java.lang.ClassFormatError: org.apache.hadoop.hive.cli.CliDriver (unrecognized class file version)
原来是java被改成了版本1.5,安装回1.7版本问题解决
----------------------------------------------------------------------------------
shark-env.sh内容为
export SPARK_MEM=1g
export SHARK_MASTER_MEM=512m
export HIVE_HOME="/usr/lib/hive"
export HIVE_CONF_DIR="/etc/hive/conf"
export HADOOP_HOME="/usr/lib/hadoop"
export SPARK_HOME="/usr/lib/spark"
export MASTER="spark://saltdb:7077"
export SHARK_EXEC_MODE=yarn
export SPARK_ASSEMBLY_JAR="/usr/lib/spark/assembly/lib/spark-assembly_2.10-0.9.0-cdh5.0.0-hadoop2.3.0-cdh5.0.0.jar"
export SHARK_ASSEMBLY_JAR="/usr/lib/shark/target/scala-2.10/shark_2.10-0.9.1.jar"
SPARK_JAVA_OPTS=" -Dspark.local.dir=/tmp "
SPARK_JAVA_OPTS+="-Dspark.kryoserializer.buffer.mb=10 "
SPARK_JAVA_OPTS+="-verbose:gc -XX:-PrintGCDetails -XX:+PrintGCTimeStamps "
export SPARK_JAVA_OPTS
启动错误
# /usr/lib/shark/bin/shark-withinfo -skipRddReload
14/05/12 13:32:06 INFO ui.SparkUI: Started Spark Web UI at http://saltdb:4040
Exception in thread "main" org.apache.spark.SparkException: YARN mode not available ?
Caused by: java.lang.ClassNotFoundException: org.apache.spark.scheduler.cluster.YarnClientClusterScheduler
解决办法在/usr/lib/shark/run文件中添加如下
# check for shark with spark on yarn params
if [ "x$SHARK_EXEC_MODE" == "xyarn" ] ; then
if [ "x$SPARK_ASSEMBLY_JAR" == "x" ] ; then
echo "No SPARK_ASSEMBLY_JAR specified. Please set SPARK_ASSEMBLY_JAR for spark on yarn mode."
exit 1
else
+ export SPARK_JAR=$SPARK_ASSEMBLY_JAR
+ if [ -f "$SPARK_JAR" ] ; then
+ SPARK_CLASSPATH+=":$SPARK_JAR"
+ echo "SPARK CLASSPATH : "$SPARK_CLASSPATH
+ fi
fi
以服务器方式启动(使用hdfs用户)
$ bin/shark sharkserver2 (启动shark命令行)
$ bin/shark --service sharkserver2
(以服务器方式启动,注意启动前需要hive-metastore启动,9083端口开启)
bin/shark --service cli (和sharkserver2什么区别??)
以beeline方式连接
# /usr/lib/shark/bin/beeline
beeline> !connect jdbc:hive2://localhost:10000/default
运行命令
shark-withinfo和shark-withdebug都是调用shark并将信息定位到控制台,而withdebug输出更详细的信息
./shark -H 输出帮助信息
运行查询被挂起
./bin/shark-withinfo
14/05/13 09:23:19 WARN scheduler.TaskSetManager: Loss was due to java.lang.RuntimeException
java.lang.RuntimeException: readObject can't find class org.apache.hadoop.hive.conf.HiveConf
解决办法
尝试将hive-common*.jar包拷贝到各NM YARN路径下(拷贝的是shark自带的hive0.11包)
# cp ./edu.berkeley.cs.shark/hive-common/hive-common-0.11.0-shark-0.9.1.jar /usr/lib/hadoop-yarn/lib/
将所有hive相关包拷贝到YARN路径下
# cp -v /opt/edu.berkeley.cs.shark/hive-*/* /usr/lib/hadoop-yarn/lib/
将shark包拷贝到各NM YARN路径下
# cp /tmp/shark_2.10-0.9.1.jar /usr/lib/hadoop-yarn/lib/
执行正常
shark> select count(*) from media_visit_info;
OK
6186276
Time taken: 17.044 seconds
使用hive server2运行
使用sharkserver2 jdbc执行发现错误
org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.security.AccessControlException: Permission denied: user=anonymous, access=ALL, inode="/tmp/hive-hdfs/hive_2014-05-13_17-31-19_028_189004440533846959/_task_tmp.-ext-10001":hdfs:hadoop:drwxr-xr-x
尝试在所有nodemanger节点上安装hive
yum install hive (注意只是安装hive ,没有hive-metastore和hive-server2)
错误变为Loss was due to java.lang.ClassNotFoundException: shark.execution.HadoopTableReader$$anonfun$7
尝试在所有节点安装shark jar包
之后启动sharkserver2并使用beeline运行查询发现下列错误
org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.security.AccessControlException: Permission denied: user=anonymous, access=ALL, inode="/tmp/hive-hdfs/hive_2014-05-16_14-39-35_906_907709274884560817/_task_tmp.-ext-10001":hdfs:hadoop:drwxr-xr-x
解决办法
在beeline登录时指定用户名
beeline> !connect jdbc:hive2://localhost:10000/default
Connecting to jdbc:hive2://localhost:10000/default
Enter username for jdbc:hive2://localhost:10000/default: hdfs
Enter password for jdbc:hive2://localhost:10000/default:
或者直接指定
0: jdbc:hive2://localhost:10000/default> !connect jdbc:hive2://localhost:10000/default hdfs ''
sharkserver2 运行后通过beeline提交的查询可以从web ui
http://192.168.10.240:4040/中查到
=================================================================
shark源代码编译
首先需要安装jdk,和scala
# wget http://www.scala-lang.org/files/archive/scala-2.10.3.tgz
# tar xvf scala-2.10.3.tgz
下载shark
# git clone https://github.com/amplab/shark.git -b branch-0.9 shark-0.9
编译命令
SHARK_HADOOP_VERSION=2.0.0-cdh4.4.0 sbt/sbt package -Dsbt.override.build.repos=true