首先明确 sparkSQl 关联hive的时候 有多种交互方式:
spark SQL 和 thrift
【先说 thrift】
在 $SPARK_HOEM/conf 中的 hive-site.xml中 要去设置以下参数:
<property>
<name>hive.metastore.uris</name>
<value>thrift://mach40:9083</value>
<description>Thrift uri for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>
<property>
<name>hive.server2.thrift.min.worker.threads</name>
<value>5</value>
<description>Minimum number of Thrift worker threads</description>
</property>
<property>
<name>hive.server2.thrift.max.worker.threads</name>
<value>500</value>
<description>Maximum number of Thrift worker threads</description>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
<description>Port number of HiveServer2 Thrift interface. Can be overridden by setting $HIVE_SERVER2_THRIFT_PORT</description>
</property>
<property>
<name>hive.server2.thrift.bind.host</name>
<value>mach40</value>
<description>Bind host on which to run the HiveServer2 Thrift interface.Can be overridden by setting$HIVE_SERVER2_THRIFT_BIND_HOST</description>
</property>
</configuration>
在hive中启动:
[root@slaver3 conf]# hive --service hiveserver2 &
[root@slaver3 conf]# jobs
[1]+ 运行中 hive --service hiveserver2 &
[root@slaver3 conf]# jps
6064 RunJar
在 spark 客户端启动:
[root@slaver3 conf]# beeline -u jdbc:hive2://slaver3:10000/default -n root -p root
出现一下
Connecting to jdbc:hive2://slaver3:10000/default
18/04/24 09:35:50 INFO jdbc.Utils: Supplied authorities: slaver3:10000
18/04/24 09:35:50 INFO jdbc.Utils: Resolved authority: slaver3:10000
18/04/24 09:35:50 INFO jdbc.HiveConnection: Will try to open client transport with JDBC Uri: jdbc:hive2://slaver3:10000/default
Connected to: Apache Hive (version 2.0.1)
Driver: Spark Project Core (version 1.6.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.6.0 by Apache Hive
0: jdbc:hive2://slaver3:10000/default>
【第二种:sparksql】
首先 要知晓一些东西:
spark 访问 hive metastore 也就是他的元数据来执行 一些操作 ;
但是spark 现在支持到了 hive的1.2 一旦hive 版本过高 是无法关联的
1.两者安装步骤略:
(参见本博客其他资料)
2.hive-site.xml
scp hive-site.xml $SPARK_HOEM/conf
hive-site.xml内容:
(内容实在多的烦就全清空)
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://mach40:3306/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8&useSSL=false</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
<description>password to use against metastore database</description>
</property>
<property>
<name>hive.querylog.location</name>
<value>/home/hive1.22/iotmp</value>
</property>
<property>
<name>hive.exec.scratchdir</name>
<value>/tmp/hive</value>
</property>
<property>
<name>hive.exec.local.scratchdir</name>
<value>/home/hive1.22/iotmp</value>
<description>Local scratch space for Hive jobs</description>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/home/hive1.22/iotmp</value>
<description>Temporary local directory for added resources in the remote file system.</description>
</property>
// 下面这一步 在需要访问的客户端的 spark/conf/中的 hive-site.xml 中 去添加 服务端不能配置此项
<property><name>hive.metastore.uris</name>
<value>thrift://mach40:9083</value>
<description>Thrift uri for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>
<property>
<name>hive.server2.thrift.min.worker.threads</name>
<value>5</value>
<description>Minimum number of Thrift worker threads</description>
</property>
<property>
<name>hive.server2.thrift.max.worker.threads</name>
<value>500</value>
<description>Maximum number of Thrift worker threads</description>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
<description>Port number of HiveServer2 Thrift interface. Can be overridden by setting $HIVE_SERVER2_THRIFT_PORT</description>
</property>
<property>
<name>hive.server2.thrift.bind.host</name>
<value>mach40</value>
<description>Bind host on which to run the HiveServer2 Thrift interface.Can be overridden by setting$HIVE_SERVER2_THRIFT_BIND_HOST</description>
</property>
</configuration>
修改 hadoop : etc/hadoop/core-site.xml
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
<description>Allow the superuser oozie to impersonate any members of the group group1 and group2</description>
</property>
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
<description>The superuser can connect only from host1 and host2 to impersonate a user</description>
</property>
3.驱动包:
scp mysql-connector-java-5.1.41-bin.jar /home/spark2.10/jars/
如果是其他版本的 spark 在lib 中或者 是spark-env.sh 中加入 上述 驱动包的 classpath,有可能还需要 hive中的
/home/hive2.01/lib/datanucleus-api-jdo-4.2.1.jar:
/home/hive2.01/lib/datanucleus-core-4.1.6.jar:
/home/hive2.01/lib/datanucleus-rdbms-4.1.7.jar
关于异常:
spark2.0以上的jar 包在$SPARK_HOME/jars下面,启动hive的时候可能汇报错:
没有文件:spark2.0/lib/spark-assmby......jar
处理方式:
vim bin/hive
spark2.0/lib/spark-assmby......jar 改成 spark2.0/jars/*jar