UbuntuKylin 16.04 LTS环境
配置无密码ssh登录和ip与hostname,在此不阐述。
需要安装java、scala和hadoop,在此不阐述。我用的java版本是java version "1.8.0_45",用的scala版本是:Scala code runner version 2.10.5,hadoop的版本:hadoop-2.6.0
spark与hive有版本兼容性问题,可以访问:
https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark:+Getting+Started
查看spark与hive的兼容性
在/etc/profile中配置:
export JAVA_HOME=/opt/jdk1.8.0_45
export SCALA_HOME=/opt/scala-2.10.5
exportHADOOP_HOME=/home/zhuhaichuan/hadoop-2.6.0
exportSPARK_HOME=/home/zhuhaichuan/spark-1.6.0-bin-hadoop2.6
exportHIVE_HOME=/home/zhuhaichuan/hive-2.1.1
exportPATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin:$HIVE_HOME/bin:$PATH
exportCLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
一、安装spark
版本:spark-1.6.0-bin-hadoop2.6.tgz
解压到相应的目录,在此目录为:/home/zhuhaichuan/spark-1.6.0-bin-hadoop2.6
需要配置文件:slaves、spark-env.sh
slaves文件中输入的是spark集群的从节点,在此我配置的伪分布式,因此salves文件的内容是本机的ip或者本机的hostname。
Spark-env.sh文件需要添加的内容是JAVA_HOME、SCALA_HOME、HADOOP_HOME三者的路径。
到此spark伪分布式安装完成。
启动spark和thriftserver:
…/sbin/start-all.sh
…/sbin/start-thriftserver.sh
二、安装mysql
在ubuntu系统中使用使用以下命令安装mysql:
sudo apt-getinstall mysql-server
sudo apt isntallmysql-client
sudo apt installlibmysqlclient-dev
安装完成后通过以下命令检测是否安装成功:
sudo netstat-tap | grep mysql
通过如下命令进入MySQL服务:
mysql -uroot -p你的密码
现在设置mysql允许远程访问,首先编辑文件/etc/mysql/mysql.conf.d/mysqld.cnf:
sudo vi /etc/mysql/mysql.conf.d/mysqld.cnf
注释掉bind-address =127.0.0.1:
保存退出,然后进入mysql服务。
用户允许远程链接:在root用户下执行
create user 'hive' identified by 'hive'; //创建用于连接的hive用户 密码为hive
grant all privileges on *.* to 'hive'@'%' identifiedby "hive" withgrant option;//%表示任意ip地址。允许在任意ip条件下以用户hive和密码hive能远程链接
flush privileges; //刷新权限
set global binlog_format='MIXED'; //设置格式 必须执行。不然报错
然后执行exit命令退出mysql服务,执行如下命令重启mysql:
exit;
service mysql restart //重启服务
测试连接
mysql -uhive-phive //能进去则表示设置成功
create database hive; //创建连接数据库hive
alter database hive character set latin1;
三、配置hive
hive的版本:apache-hive-2.1.1-bin.tar.gz
配置hive-env.sh
添加HADOOP_HOME、HIVE_CONF_DIR、HIVE_AUX_JARS_PATH的内容
exportHADOOP_HOME=/home/zhuhaichuan/hadoop-2.6.0
export HIVE_CONF_DIR=/home/zhuhaichuan/hive-2.1.1/conf
exportHIVE_AUX_JARS_PATH=/home/zhuhaichuan/hive-2.1.1/lib
配置hive-site.xml文件
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/data/hive/warehouse</value>
<description>location of default database for thewarehouse</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>Username to use against metastoredatabase</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
<description>password to use against metastoredatabase</description>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://192.168.1.144:3306/hive?createDatabaseIfNotExist=true&useUnicode=true&characterEncoding=UTF-8</value>
<description>
JDBC connectstring for a JDBC metastore.
To use SSL toencrypt/authenticate the connection, provide database-specific SSL flag in theconnection URL.
For example,jdbc:postgresql://myhost/db?ssl=true for postgres database.
</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBCmetastore</description>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://zhumaster:9083</value>
<description>Thrift URI for the remote metastore. Used bymetastore client to connect to remote metastore.</description>
</property>
<property>
<name>hive.execution.engine</name>
<value>spark</value>
<description>
Expects oneof [mr, tez, spark].
Choosesexecution engine. Options are: mr (Map reduce, default), tez, spark. While MR
remains thedefault engine for historical reasons, it is itself a historical engine
and isdeprecated in Hive 2 line. It may be removed without further warning.
</description>
</property>
<property>
<name>hive.server2.thrift.bind.host</name>
<value>zhumaster</value>
<description>Bind host on which to run the HiveServer2 Thriftservice.</description>
</property>
<property>
<name>hive.exec.scratchdir</name>
<value>hdfs://zhumaster:8020/tmp/hive</value>
<description>HDFS root scratch dir for Hive jobs which getscreated with write all (733) permission. For each connecting user, an HDFSscratch dir: ${hive.exec.scratchdir}/<username> is created, with${hive.scratch.dir.permission}.</description>
</property>
<property>
<name>hive.exec.local.scratchdir</name>
<value>/home/zhuhaichuan/tmp/hive</value>
<description>Local scratch space for Hive jobs</description>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
<description>
Enforce metastore schemaversion consistency.
True: Verify that versioninformation stored in is compatible with one from Hive jars. Also disable automatic
schema migrationattempt. Users are required to manually migrate schema after Hive upgrade whichensures
proper metastoreschema migration. (Default)
False: Warn if the versioninformation stored in metastore doesn't match with one from in Hive jars.
</description>
</property>
<property>
<name>hive.server2.enable.doAs</name>
<value>false</value>
<description>
Setting this property totrue will have HiveServer2 execute
Hive operations as the usermaking the calls to it.
</description>
</property>
配置完成
hive使用mysql进行存储metastore数据,
把mysql的驱动jar(mysql-connector-java-5.1.40-bin.jar)包复制到hive-2.1.1的lib路径下
cp …/mysql-connector-java-5.1.40-bin.jar .../hive-2.1.1/lib/
hive元数据与mysql同步: …/hive-2.1.1/bin/schematool -dbType mysql -initSchema
必须把hive-default.xml.template 复制一份命名为 hive-site.xml ,一定要有hive-site.xml 才行,然后再运行 schematool -dbType mysql -initSchema 把之前创建的元数据都同步到mysql 里。
后台运行hive的metastore:hive --servicemetastore > …/hivemetastore.log 2>&1 &
后台运行hive的hiveserver2:hive --servicehiveserver2 > …/hiveserver2.log 2>&1 &
需要把spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar复制到hive的lib路径下(hive-2.1.1/lib/)。把hive的hive-site.xml文件复制到spark-1.6.0-bin-hadoop2.6/conf/中
在intelliJ_IDEA中写程序的话,需要把hive-site.xml放入项目的src中。
在使用intelliJ_IDEA工具开发spark+hive时,需要开启thriftserver,使用spark的“start-all.sh”命令是开启spark的master、worker节点。