本文简单记录一下hive的安装配置步骤
tar -xzvf hive-0.11.0-bin.tar.gz
在~home目录中配置path变量将hive bin目录加入path
export PATH=$PATH:/home/hadoop/hadoop-0.20.2/bin:/home/hadoop/hbase-0.92.1/bin:/home/hadoop/zookeeper-3.4.5/bin:/home/hadoop/hive-0.11.0-bin/bin
配置Hive
进入hive/conf目录
依据hive-env.sh.template,创建hive-env.sh文件
cp hive-env.sh.template hive-env.sh
修改hive-env.sh
指定hive配置文件的路径
export HIVE_CONF_DIR=/home/hadoop/hive-0.11.0-bin/conf
指定Hadoop路径
HADOOP_HOME=/home/hadoop/hadoop-0.20.2
配置HADOOP_HOME后,Hive将数据仓库文件保存在HDFS文件系统;metastore保存在Hive自带的嵌入式数据库Derby中。
在HDFS文件系统中创建如下目录
hadoop fs -mkdir /tmp
hadoop fs -mkdir /user/hive/warehouse
hadoop fs -chmod g+w /tmp
hadoop fs -chmod g+w /user/hive/warehouse
测试运行hive
hive> create table test(key string);
hive> show tables;
在本地文件系统创建一个如下的文本文件:/home/hadoop/hiveimpt/student.txt
C01,N0101,82
C01,N0102,59
C01,N0103,65
C02,N0201,81
C02,N0202,82
C02,N0203,79
C03,N0301,56
C03,N0302,92
C03,N0306,72
创建一个hive的表
create table student(classNo string, stuNo string, score int) row format delimited fields terminated by ',';
将数据load到表中
load data local inpath '/home/hadoop/hiveimpt/student.txt'overwrite into table student;
这个命令将student.txt文件内容加载到表student中。这个加载操作将直接把student.txt文件复制到hive的warehouse目录中,这个目录由hive.metastore.warehouse.dir配置项设置,默认值为/user/hive/warehouse。Overwrite选项将导致Hive事先删除student目录下所有的文件。
Hive不会对student.txt做任何格式处理,因为Hive本身并不强调数据的存储格式。
此例中,Hive将数据存储于HDFS系统中。当然,Hive也可以将数据存储于本地。
可以执行类似SQL的命令:
hive> select * from student;
hive> Select classNo,count(score) from student where score>=60 group by classNo;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Starting Job = job_201403182323_0001, Tracking URL = http://hadoopsrv:50030/jobdetails.jsp?jobid=job_201403182323_0001
Kill Command = /home/hadoop/hadoop-0.20.2/bin/../bin/hadoop job -kill job_201403182323_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2014-03-19 00:42:27,572 Stage-1 map = 0%, reduce = 0%
2014-03-19 00:42:34,861 Stage-1 map = 100%, reduce = 0%
2014-03-19 00:42:43,972 Stage-1 map = 100%, reduce = 33%
2014-03-19 00:42:47,029 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201403182323_0001
MapReduce Jobs Launched:
Job 0: Map: 1 Reduce: 1 HDFS Read: 116 HDFS Write: 18 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
C01 2
C02 3
C03 2
Time taken: 38.263 seconds, Fetched: 3 row(s)
由此看见,HiveQL的使用和SQL及其类似。我们用到了group和count,其实在后台Hive将这些操作都转换成了MapReduce操作提交给Hadoop执行,并最终输出结果。
用安装在另外一台机器的mysql保存metastore:
在mysql上建立db和用户
mysql>create database hive;
mysql> CREATE USER 'hive' IDENTIFIED BY 'hive';
mysql> GRANT ALL PRIVILEGES ON *.* TO 'hive'@'%' WITH GRANT OPTION;
mysql> flush privileges;
在hive的机器上验证连接:
mysql -h 192.168.2.3 -u hive -phive
在hive的机器上配置hive-site.xml
<property>
<name>hive.metastore.local</name>
<value>true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://192.168.2.3:3306/hive?characterEncoding=UTF-8</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
</property>
常见异常:
com.mysql.jdbc.exceptions.MySQLSyntaxErrorException: Specified key was toolong; max key length is nnn bytes
改数据库编码
tar -xzvf hive-0.11.0-bin.tar.gz
在~home目录中配置path变量将hive bin目录加入path
export PATH=$PATH:/home/hadoop/hadoop-0.20.2/bin:/home/hadoop/hbase-0.92.1/bin:/home/hadoop/zookeeper-3.4.5/bin:/home/hadoop/hive-0.11.0-bin/bin
配置Hive
进入hive/conf目录
依据hive-env.sh.template,创建hive-env.sh文件
cp hive-env.sh.template hive-env.sh
修改hive-env.sh
指定hive配置文件的路径
export HIVE_CONF_DIR=/home/hadoop/hive-0.11.0-bin/conf
指定Hadoop路径
HADOOP_HOME=/home/hadoop/hadoop-0.20.2
配置HADOOP_HOME后,Hive将数据仓库文件保存在HDFS文件系统;metastore保存在Hive自带的嵌入式数据库Derby中。
在HDFS文件系统中创建如下目录
hadoop fs -mkdir /tmp
hadoop fs -mkdir /user/hive/warehouse
hadoop fs -chmod g+w /tmp
hadoop fs -chmod g+w /user/hive/warehouse
测试运行hive
hive> create table test(key string);
hive> show tables;
在本地文件系统创建一个如下的文本文件:/home/hadoop/hiveimpt/student.txt
C01,N0101,82
C01,N0102,59
C01,N0103,65
C02,N0201,81
C02,N0202,82
C02,N0203,79
C03,N0301,56
C03,N0302,92
C03,N0306,72
创建一个hive的表
create table student(classNo string, stuNo string, score int) row format delimited fields terminated by ',';
将数据load到表中
load data local inpath '/home/hadoop/hiveimpt/student.txt'overwrite into table student;
这个命令将student.txt文件内容加载到表student中。这个加载操作将直接把student.txt文件复制到hive的warehouse目录中,这个目录由hive.metastore.warehouse.dir配置项设置,默认值为/user/hive/warehouse。Overwrite选项将导致Hive事先删除student目录下所有的文件。
Hive不会对student.txt做任何格式处理,因为Hive本身并不强调数据的存储格式。
此例中,Hive将数据存储于HDFS系统中。当然,Hive也可以将数据存储于本地。
可以执行类似SQL的命令:
hive> select * from student;
hive> Select classNo,count(score) from student where score>=60 group by classNo;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Starting Job = job_201403182323_0001, Tracking URL = http://hadoopsrv:50030/jobdetails.jsp?jobid=job_201403182323_0001
Kill Command = /home/hadoop/hadoop-0.20.2/bin/../bin/hadoop job -kill job_201403182323_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2014-03-19 00:42:27,572 Stage-1 map = 0%, reduce = 0%
2014-03-19 00:42:34,861 Stage-1 map = 100%, reduce = 0%
2014-03-19 00:42:43,972 Stage-1 map = 100%, reduce = 33%
2014-03-19 00:42:47,029 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201403182323_0001
MapReduce Jobs Launched:
Job 0: Map: 1 Reduce: 1 HDFS Read: 116 HDFS Write: 18 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
C01 2
C02 3
C03 2
Time taken: 38.263 seconds, Fetched: 3 row(s)
由此看见,HiveQL的使用和SQL及其类似。我们用到了group和count,其实在后台Hive将这些操作都转换成了MapReduce操作提交给Hadoop执行,并最终输出结果。
用安装在另外一台机器的mysql保存metastore:
在mysql上建立db和用户
mysql>create database hive;
mysql> CREATE USER 'hive' IDENTIFIED BY 'hive';
mysql> GRANT ALL PRIVILEGES ON *.* TO 'hive'@'%' WITH GRANT OPTION;
mysql> flush privileges;
在hive的机器上验证连接:
mysql -h 192.168.2.3 -u hive -phive
在hive的机器上配置hive-site.xml
<property>
<name>hive.metastore.local</name>
<value>true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://192.168.2.3:3306/hive?characterEncoding=UTF-8</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
</property>
常见异常:
com.mysql.jdbc.exceptions.MySQLSyntaxErrorException: Specified key was toolong; max key length is nnn bytes
改数据库编码
alter database hive character set latin1;
http://www.cnblogs.com/linjiqin/archive/2013/03/04/2943025.html