spark+hive环境搭建

本文详细介绍如何在UbuntuKylin16.04 LTS环境下安装配置Spark与Hive,并实现两者之间的集成。包括Spark的安装配置、MySQL安装及配置、Hive配置并与MySQL同步等关键步骤。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

UbuntuKylin 16.04  LTS环境

配置无密码ssh登录和ip与hostname,在此不阐述。

需要安装java、scala和hadoop,在此不阐述。我用的java版本是java version "1.8.0_45",用的scala版本是:Scala code runner version 2.10.5,hadoop的版本:hadoop-2.6.0

spark与hive有版本兼容性问题,可以访问:

https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark:+Getting+Started

查看spark与hive的兼容性

在/etc/profile中配置:

export JAVA_HOME=/opt/jdk1.8.0_45

export SCALA_HOME=/opt/scala-2.10.5

exportHADOOP_HOME=/home/zhuhaichuan/hadoop-2.6.0

exportSPARK_HOME=/home/zhuhaichuan/spark-1.6.0-bin-hadoop2.6

exportHIVE_HOME=/home/zhuhaichuan/hive-2.1.1

exportPATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin:$HIVE_HOME/bin:$PATH

exportCLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

 

一、安装spark

版本:spark-1.6.0-bin-hadoop2.6.tgz

解压到相应的目录,在此目录为:/home/zhuhaichuan/spark-1.6.0-bin-hadoop2.6

需要配置文件:slaves、spark-env.sh

slaves文件中输入的是spark集群的从节点,在此我配置的伪分布式,因此salves文件的内容是本机的ip或者本机的hostname。

Spark-env.sh文件需要添加的内容是JAVA_HOME、SCALA_HOME、HADOOP_HOME三者的路径。

到此spark伪分布式安装完成。

启动spark和thriftserver:

…/sbin/start-all.sh

…/sbin/start-thriftserver.sh

二、安装mysql

在ubuntu系统中使用使用以下命令安装mysql:

sudo apt-getinstall mysql-server

sudo apt isntallmysql-client

sudo apt installlibmysqlclient-dev

安装完成后通过以下命令检测是否安装成功:

sudo netstat-tap | grep mysql

通过如下命令进入MySQL服务:

mysql -uroot -p你的密码

现在设置mysql允许远程访问,首先编辑文件/etc/mysql/mysql.conf.d/mysqld.cnf

sudo vi /etc/mysql/mysql.conf.d/mysqld.cnf

注释掉bind-address =127.0.0.1

保存退出,然后进入mysql服务。

用户允许远程链接:在root用户下执行

create user 'hive' identified by 'hive'; //创建用于连接的hive用户 密码为hive

grant all privileges on *.* to 'hive'@'%' identifiedby "hive" withgrant option;//%表示任意ip地址。允许在任意ip条件下以用户hive和密码hive能远程链接

flush privileges; //刷新权限

set global binlog_format='MIXED'; //设置格式  必须执行。不然报错

然后执行exit命令退出mysql服务,执行如下命令重启mysql

exit;

service mysql restart   //重启服务

测试连接

    mysql  -uhive-phive  //能进去则表示设置成功

    create database hive; //创建连接数据库hive

    alter database hive character set latin1;

三、配置hive

hive的版本:apache-hive-2.1.1-bin.tar.gz

配置hive-env.sh

添加HADOOP_HOMEHIVE_CONF_DIRHIVE_AUX_JARS_PATH的内容

exportHADOOP_HOME=/home/zhuhaichuan/hadoop-2.6.0

export HIVE_CONF_DIR=/home/zhuhaichuan/hive-2.1.1/conf

exportHIVE_AUX_JARS_PATH=/home/zhuhaichuan/hive-2.1.1/lib

配置hive-site.xml文件

  <property>

   <name>hive.metastore.warehouse.dir</name>

   <value>/data/hive/warehouse</value>

   <description>location of default database for thewarehouse</description>

  </property>

  <property>

   <name>javax.jdo.option.ConnectionUserName</name>

   <value>hive</value>

   <description>Username to use against metastoredatabase</description>

  </property>

  <property>

    <name>javax.jdo.option.ConnectionPassword</name>

   <value>hive</value>

   <description>password to use against metastoredatabase</description>

  </property>

  <property>

       <name>javax.jdo.option.ConnectionURL</name>   <value>jdbc:mysql://192.168.1.144:3306/hive?createDatabaseIfNotExist=true&amp;useUnicode=true&amp;characterEncoding=UTF-8</value>

   <description>

      JDBC connectstring for a JDBC metastore.

      To use SSL toencrypt/authenticate the connection, provide database-specific SSL flag in theconnection URL.

      For example,jdbc:postgresql://myhost/db?ssl=true for postgres database.

   </description>

  </property>

  <property>

   <name>javax.jdo.option.ConnectionDriverName</name>

   <value>com.mysql.jdbc.Driver</value>

   <description>Driver class name for a JDBCmetastore</description>

  </property>

  <property>

   <name>hive.metastore.uris</name>

   <value>thrift://zhumaster:9083</value>

   <description>Thrift URI for the remote metastore. Used bymetastore client to connect to remote metastore.</description>

  </property>

  <property>

   <name>hive.execution.engine</name>

   <value>spark</value>

   <description>

      Expects oneof [mr, tez, spark].

      Choosesexecution engine. Options are: mr (Map reduce, default), tez, spark. While MR

      remains thedefault engine for historical reasons, it is itself a historical engine

      and isdeprecated in Hive 2 line. It may be removed without further warning.

   </description>

 </property>

  <property>

   <name>hive.server2.thrift.bind.host</name>

   <value>zhumaster</value>

   <description>Bind host on which to run the HiveServer2 Thriftservice.</description>

  </property>

  <property>

   <name>hive.exec.scratchdir</name>

   <value>hdfs://zhumaster:8020/tmp/hive</value>

   <description>HDFS root scratch dir for Hive jobs which getscreated with write all (733) permission. For each connecting user, an HDFSscratch dir: ${hive.exec.scratchdir}/&lt;username&gt; is created, with${hive.scratch.dir.permission}.</description>

  </property>

  <property>

   <name>hive.exec.local.scratchdir</name>

   <value>/home/zhuhaichuan/tmp/hive</value>

   <description>Local scratch space for Hive jobs</description>

  </property>

  <property>

   <name>hive.metastore.schema.verification</name>

   <value>false</value>

    <description>

      Enforce metastore schemaversion consistency.

      True: Verify that versioninformation stored in is compatible with one from Hive jars.  Also disable automatic

            schema migrationattempt. Users are required to manually migrate schema after Hive upgrade whichensures

            proper metastoreschema migration. (Default)

      False: Warn if the versioninformation stored in metastore doesn't match with one from in Hive jars.

    </description>

  </property>

  <property>

   <name>hive.server2.enable.doAs</name>

   <value>false</value>

    <description>

      Setting this property totrue will have HiveServer2 execute

      Hive operations as the usermaking the calls to it.

    </description>

  </property>

配置完成

hive使用mysql进行存储metastore数据,

mysql的驱动jarmysql-connector-java-5.1.40-bin.jar)包复制到hive-2.1.1lib路径下

cp …/mysql-connector-java-5.1.40-bin.jar  .../hive-2.1.1/lib/

hive元数据与mysql同步: …/hive-2.1.1/bin/schematool -dbType mysql -initSchema

必须把hive-default.xml.template 复制一份命名为 hive-site.xml ,一定要有hive-site.xml 才行,然后再运行 schematool -dbType mysql -initSchema  把之前创建的元数据都同步到mysql 里。

后台运行hivemetastorehive --servicemetastore > …/hivemetastore.log 2>&1 &

后台运行hivehiveserver2hive --servicehiveserver2 > …/hiveserver2.log 2>&1 &

需要把spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar复制到hivelib路径下(hive-2.1.1/lib/)。把hivehive-site.xml文件复制到spark-1.6.0-bin-hadoop2.6/conf/

intelliJ_IDEA中写程序的话,需要把hive-site.xml放入项目的src中。


在使用intelliJ_IDEA工具开发spark+hive时,需要开启thriftserver,使用spark的“start-all.sh”命令是开启spark的master、worker节点。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值