Hive 2.2安装指南

本文详细介绍如何在Ubuntu系统上安装部署Hive2.2,包括Hadoop、MySQL的安装配置,以及Hive模块介绍、环境变量设置、配置文件修改等内容。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

简介

通过自己安装部署Hadoop和Hive可以让用户更加清楚的知道这些工具是如何工作的,特别是对开发者来说。本文详细介绍了如何在Ubuntu系统上的安装部署Hive 2.2,这里使用的Hadoop版本为2.7.3。

Hive模块介绍

在介绍具体的安装前,先来了解一下Hive模块的组成。Hive模块组成上看,Hive运行于Hadoop之上,因此我们需要首先安装Hadoop。Metastore(元数据存储)是一个独立的关系型数据库,Hive会在其中保存表模式和其他系统元数据,这里选择MySQL数据库。因此,我们需要在安装Hive之前,首先要安装和部署Hadoop和MySQL。

Hadoop安装

Hive运行于Hadoop上,因此我们需要首先安装Hadoop。个人推荐使用Hadoop伪分布式模式下使用Hive,具体配置可参考: 在Ubuntu环境下配置Hadoop伪分布式模式运行环境
Hadoop安装后,我们在HDFS上创建Hive运行中需要使用的目录,后面的Hive配置中将具体讲解相关目录的配置:
hadoop fs -mkdir       /tmp
hadoop fs -mkdir       /user/hive/warehouse
hadoop fs -chmod g+w   /tmp
hadoop fs -chmod g+w   /user/hive/warehouse
Hive使用环境变量HADOOP_HOME来指定Hadoop的所有相关JAR和配置文件。因此,在继续进行之前请确认下是否设置好了这个环境变量,可以追加到.bashrc文件中。
# Hadoop Home for hive config
export HADOOP_HOME/usr/local/hadoop

MySQL安装

在Ubuntu系统中,安装MySQL比较简单,我们直接运行相关的apt-get命令:sudo apt-get install mysql-server mysql-client。安装完成后,以root用户登录数据库,创建hive用户,db_hive数据库,并设置授权。
hadoop@bob-virtual-machine:/etc/mysql$ mysql -u root -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 5
Server version: 5.7.19-0ubuntu0.16.04.1 (Ubuntu)

Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> create database db_hive;
Query OK, 1 row affected (0.00 sec)

mysql> create user 'hive'@'localhost' identified by 'hive';
Query OK, 0 rows affected (0.00 sec)

mysql> grant all on db_hive.* to hive@'%' identified by 'hive';
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

mysql> exit
Bye

Hive安装

下载安装
Hive的过程和安装Hadoop的过程非常相似。我们需要先下载一个Hive发布包,然后进行解压缩。通常这个压缩包内不会包含有某个版本的Hadoop。一个Hive二进制包可以在多个版本的Hadoop上工作。这也意味着和Hadoop版本升级相比,升级Hive到新的版本会更加容易和低风险。Hive的版本可以官网下载:https://hive.apache.org/downloads.html
将下载的tar文件解压到/usr/local/hadoop目录下:
sudo tar -zxvf apache-hive-2.2.0-bin.tar.gz -C /usr/local
sudo ln -s /usr/local/hive /usr/local/apache-hive-2.2.0-bin
sudo chown -R hadoop  /usr/local/hive
设置环境变量
将Hive相关的环境变量追加到.bashrc中:
export HIVE_HOME=/usr/local/hive
export PATH=$HIVE_HOME/bin:$PATH
修改配置文件
Hive的配置文件位于$HIVE_HOME/conf目录下,已经提供了缺省的模板,我们需要根据需要来进行定制。
基于hive_env.sh.template拷贝一份hive_env.sh,并修改下面的内容:
# Set HADOOP_HOME to point to a specific hadoop install directory
HADOOP_HOME=/usr/local/hadoop
hive-default.xml.template是Hive的缺省配置,内容比较多,仅供参考。Hive使用的配置文件是hive-site.xml,如果配置项比较多,可以从hive-default.xml.template拷贝一份,在上面修改。这里我们直接编写hive-site.xml文件。
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
-->
<configuration>
    <property>
        <name>hive.exec.scratchdir</name>
        <value>/tmp</value>
<description>temporary directory for hive execution</description>
    </property>

    <property>
        <name>hive.metastore.warehouse.dir</name>
        <value>hdfs://localhost:9000/user/hive/warehouse</value>
        <description>location of default database for the warehouse</description>
    </property>

    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://127.0.0.1:3306/db_hive?useSSL=false</value>
        <description>JDBC connect string for a JDBC metastore</description>
    </property>

    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
        <description>Driver class name for a JDBC metastore</description>
    </property>

    <property>
        <name>javax.jdo.option.ConnectionPassword </name>
        <value>hive</value>
    </property>

    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>hive</value>
        <description>Username to use against metastore database</description>
    </property>

    <property>
        <name>javax.jdo.option.Multithreaded</name>
        <value>true</value>
    </property>
</configuration>
配置中包含了临时工作目录hive.exec.scratchdir,metastore的工作目录hive.metastore.warehouse.dir,以及存储Hive的元数据和表模式的MySQL数据库的配置。
初始化metastore schema
使用schematool离线工具来初始化:schematool -dbType mysql -initSchema
hadoop@bob-virtual-machine:~/apache-hive-2.2.0/lib$ schematool -dbType mysql -initSchema

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/apache-hive-2.2.0/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL:        jdbc:mysql://127.0.0.1:3306/db_hive?useSSL=false
Metastore Connection Driver :    com.mysql.jdbc.Driver
Metastore connection User:       hive
Starting metastore schema initialization to 2.1.0
Initialization script hive-schema-2.1.0.mysql.sql
Initialization script completed
schemaTool completed
Hive测试验证
到这里,准备工作已经完成,接下来可以通过Hive CLI来验证配置是否正确。这里通过创建一个简单的名为test表,并显示表的schema信息。
hadoop@bob-virtual-machine:~$ hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/apache-hive-2.2.0/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/usr/local/apache-hive-2.2.0/lib/hive-common-2.2.0.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive>
hive> create table test (a int);
OK
Time taken: 2.345 seconds
hive> show tables;
OK
test
Time taken: 0.271 seconds, Fetched: 1 row(s)
hive> show create table test;
OK
CREATE TABLE `test`(
  `a` int)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://localhost:9000/user/hive/warehouse/test'
TBLPROPERTIES (
  'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}',
  'numFiles'='0',
  'numRows'='0',
  'rawDataSize'='0',
  'totalSize'='0',
  'transient_lastDdlTime'='1506997133')
Time taken: 0.423 seconds, Fetched: 17 row(s)
hive>
shell环境是我们与Hive交互,发出HiveQL命令的主要方式。HiveQL作为Hive的查询语言。它是SQL的一种方言,它的设计在很大程度上受MySQL的影响。从上面命令看,熟悉MySQL的用户并不陌生。现在,已经成功配置了Hive,可以正常使用了。
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值