Hadoop环境搭建:
学习:hadoop-2.8.1 Apache Hadoop
生产环境建议使用: CDH HDP (兼容性非常好)大数据平台里面会用到非常非常多的框架,也会遇到非常多的兼容性的问题,Jar包冲突【跑不掉】,所以不建议很多组合使用Apache版本,建议使用 CDH HDP
统一软件安装包下载路径:http://archive.cloudera.com/cdh5/cdh/5/
选择统一的cdh5.7.0尾号 ,一定要选择正确
hadoop-2.6.0-cdh5.7.0.tar.gz
hive-1.1.0-cdh5.7.0.tar.gz
机器文件部署建议:
机器目录结构相关: hadoop/hadoop
hadoop000(192.168.199.151)
hadoop001
hadoop002
.........
/home/hadoop/
software 存放安装软件
data 存放测试数据
source 存放源代码
lib 存放相关开发的jar
app 软件安装目录
tmp 存放HDFS/Kafka/ZK数据目录
maven_repo maven本地仓库
shell 存放上课相关的脚本
安装部署
下载hive
wget http://archive.cloudera.com/cdh5/cdh/5/hive-1.1.0-cdh5.7.0.tar.gz
可以解压hadoop用户tar -zxvf hive-1.1.0-cdh5.7.0.tar.gz -C ~/app
tar -zxvf hive-1.1.0-cdh5.7.0.tar.gz -C ~/app
HIVE_HOME到系统环境变量:
[hadoop@hadoop05 app]$ vi ~/.bash_profile
export HIVE_HOME=/home/hadoop/app/hive-1.1.0-cdh5.7.0
export PATH=$HIVE_HOME/bin:$PATH
[hadoop@hadoop05 app]$ source ~/.bash_profile
[hadoop@hadoop05 app]$ echo $HIVE_HOME
配置文件修改
[hadoop@hadoop05 app]$ cd hive-1.1.0-cdh5.7.0/
[hadoop@hadoop05 hive-1.1.0-cdh5.7.0]$ ls
bin conf data docs examples hcatalog lib LICENSE NOTICE README.txt RELEASE_NOTES.txt scripts
[hadoop@hadoop05 hive-1.1.0-cdh5.7.0]$ cd conf
[hadoop@hadoop-01 conf]$ cp hive-env.sh.template hive-env.sh
vi hive-env.sh
HADOOP_HOME=/home/hadoop/app/hadoop-2.6.0-cdh5.7.0
注意:(当数据库不存在的时候,自动创建数据库)
hive-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/gordon?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>root</value>
</property>
</configuration>
拷贝MySQL驱动包到$HIVE_HOME/lib
拷贝 mysql-connector-java-5.1.27-bin.jar
[hadoop@hadoop-01 ~]$ cd $HIVE_HOME
[hadoop@hadoop-01 hive-1.1.0-cdh5.7.0]$ cd lib
[hadoop@hadoop-01 lib]$ rz mysql-connector-java-5.1.27-bin.jar
[hadoop@hadoop-01 lib]$ ll
-rw-r--r--. 1 hadoop hadoop 872303 Dec 19 17:22 mysql-connector-java-5.1.27-bin.jar
如果没有拷贝MySQL驱动包,启动hive会报错:
The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH.
Please check your CLASSPATH specification,
and the name of the driver。
启动hive
[hadoop@hadoop05 bin]$ ./hive
which: no hbase in (/home/hadoop/app/hive-1.1.0-cdh5.7.0/bin:/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/bin:/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/sbin:/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/bin:/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/sbin:/home/hadoop/app/jdk1.8.0_45/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin)
Logging initialized using configuration in jar:file:/home/hadoop/app/hive-1.1.0-cdh5.7.0/lib/hive-common-1.1.0-cdh5.7.0.jar!/hive-log4j.properties
WARNING: Hive CLI is deprecated and migration to Beeline is recommended.
hive> show tables;
OK
Time taken: 5.125 seconds
查看mysql下自动创建了数据库。
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| gordon |
| mysql |
| performance_schema |
| test |
+--------------------+
5 rows in set (0.02 sec)
hive下创建一张表
hive> create table xx(id int);
OK
Time taken: 5.38 seconds
hive> show tables;
OK
xx
Time taken: 0.49 seconds, Fetched: 1 row(s)
创建表失败:
FAILED: Execution Error,
return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask.
MetaException(message:For direct MetaStore DB
connections, we don't support retries at the client
level.)
思路:找日志
日志在哪里:$HIVE_HOME/conf/hive-log4j.properties.template
hive.log.dir=${java.io.tmpdir}/${user.name}
hive.log.file=hive.log
问题:能不能改,如何改?
日志错误:
ERROR [main]: Datastore.Schema (Log4JLogger.java:error(115)) - An exception was thrown while adding/validating class(es) :
Specified key was too long; max key length is 767 bytes
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Specified key was too long; max key length is 767 bytes
解决方案:
alter database ruozedata_basic02 character set latin1;
在Hive中,默认的日志路径是在/tmp底下,一般Linux或者CentOS系统会一个月自动清理一次tmp底下的东西,所以要将日志进行更换位置。
首先需要cp一份hive-log4j.properties文件,在默认安装好Hive后是没有的
可以修改路径:
查看日志可以发现Hive的默认底层是MapReduce