hive的安装配置步骤简单记录

最新推荐文章于 2024-11-11 09:21:00 发布

原创最新推荐文章于 2024-11-11 09:21:00 发布 · 4k 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#hive

数据库大数据专栏收录该内容

19 篇文章

订阅专栏

本文详细介绍了Hive的安装及配置步骤，包括环境变量设置、Hive配置文件修改、HDFS目录创建等内容，并演示了如何通过HiveQL进行基本的数据操作。

本文简单记录一下hive的安装配置步骤
tar -xzvf hive-0.11.0-bin.tar.gz

在~home目录中配置path变量将hive bin目录加入path
export PATH=$PATH:/home/hadoop/hadoop-0.20.2/bin:/home/hadoop/hbase-0.92.1/bin:/home/hadoop/zookeeper-3.4.5/bin:/home/hadoop/hive-0.11.0-bin/bin

配置Hive
进入hive/conf目录
依据hive-env.sh.template，创建hive-env.sh文件
cp hive-env.sh.template hive-env.sh
修改hive-env.sh
指定hive配置文件的路径
export HIVE_CONF_DIR=/home/hadoop/hive-0.11.0-bin/conf
指定Hadoop路径
HADOOP_HOME=/home/hadoop/hadoop-0.20.2

配置HADOOP_HOME后，Hive将数据仓库文件保存在HDFS文件系统；metastore保存在Hive自带的嵌入式数据库Derby中。
在HDFS文件系统中创建如下目录
hadoop fs -mkdir /tmp
hadoop fs -mkdir /user/hive/warehouse
hadoop fs -chmod g+w /tmp
hadoop fs -chmod g+w /user/hive/warehouse

测试运行hive
hive> create table test(key string);
hive> show tables;

在本地文件系统创建一个如下的文本文件：/home/hadoop/hiveimpt/student.txt
C01,N0101,82
C01,N0102,59
C01,N0103,65
C02,N0201,81
C02,N0202,82
C02,N0203,79
C03,N0301,56
C03,N0302,92
C03,N0306,72
创建一个hive的表
create table student(classNo string, stuNo string, score int) row format delimited fields terminated by ',';
将数据load到表中
load data local inpath '/home/hadoop/hiveimpt/student.txt'overwrite into table student;

这个命令将student.txt文件内容加载到表student中。这个加载操作将直接把student.txt文件复制到hive的warehouse目录中，这个目录由hive.metastore.warehouse.dir配置项设置，默认值为/user/hive/warehouse。Overwrite选项将导致Hive事先删除student目录下所有的文件。
Hive不会对student.txt做任何格式处理，因为Hive本身并不强调数据的存储格式。
此例中，Hive将数据存储于HDFS系统中。当然，Hive也可以将数据存储于本地。

可以执行类似SQL的命令:
hive> select * from student;

hive> Select classNo,count(score) from student where score>=60 group by classNo;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Starting Job = job_201403182323_0001, Tracking URL = http://hadoopsrv:50030/jobdetails.jsp?jobid=job_201403182323_0001
Kill Command = /home/hadoop/hadoop-0.20.2/bin/../bin/hadoop job -kill job_201403182323_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2014-03-19 00:42:27,572 Stage-1 map = 0%, reduce = 0%
2014-03-19 00:42:34,861 Stage-1 map = 100%, reduce = 0%
2014-03-19 00:42:43,972 Stage-1 map = 100%, reduce = 33%
2014-03-19 00:42:47,029 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201403182323_0001
MapReduce Jobs Launched:
Job 0: Map: 1 Reduce: 1   HDFS Read: 116 HDFS Write: 18 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
C01     2
C02     3
C03     2
Time taken: 38.263 seconds, Fetched: 3 row(s)

由此看见，HiveQL的使用和SQL及其类似。我们用到了group和count，其实在后台Hive将这些操作都转换成了MapReduce操作提交给Hadoop执行，并最终输出结果。
用安装在另外一台机器的mysql保存metastore：
在mysql上建立db和用户
mysql>create database hive;
mysql> CREATE USER 'hive' IDENTIFIED BY 'hive';
mysql> GRANT ALL PRIVILEGES ON *.* TO 'hive'@'%' WITH GRANT OPTION;
mysql> flush privileges;
在hive的机器上验证连接：
mysql -h 192.168.2.3 -u hive -phive

在hive的机器上配置hive-site.xml
    <property>
        <name>hive.metastore.local</name>
        <value>true</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://192.168.2.3:3306/hive?characterEncoding=UTF-8</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>hive</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>hive</value>
    </property>

常见异常：
com.mysql.jdbc.exceptions.MySQLSyntaxErrorException: Specified key was toolong; max key length is nnn bytes
改数据库编码

alter database hive character set latin1;

http://www.cnblogs.com/linjiqin/archive/2013/03/04/2943025.html