1 安装MySQL
安装服务器:
sudo apt-get install mysql-server
(注意:期间需要设置root用户的密码哟)
安装客户端:
sudo apt-get install mysql-client
sudo apt-get install libmysqlclient-dev
检查系统中是否有MySQL服务,出现下面的大概就是成功了
sudo netstat -tap|grep mysql
查看数据库字符集:
show variables like 'character%';
我们发现数据库的client是utf8编码,而server是latin1编码,系统也是utf8编码;而我们经常开发环境字符集为utf8,所以我们这里修改server的编码
设置数据库编码,打开配置文件/etc/mysql/mysql.conf.d/mysqld.cnf
修改添加:
[client]
port = 3306
socket = /var/run/mysqld/mysqld.sock
#增加客户端的字符编码
default-character-set=utf8
[mysqld]
# 增加服务器端的字符编码
character-set-server=utf8
collation-server=utf8_general_ci
# 注释bind-address
# bind-address = 127.0.0.1
查看一下编码格式:
show variables like 'character%';
2 在MySQL上创建一个hive数据库,并在hive数据库中建立表user
登录数据库(root用户)
mysql -u root -p
创建hive数据库
create database hive;
使用hive数据库
use hive;
在hive数据库里面创建user用户表
create table user(Host char(20),User char(10),Password char(20));
3 进入MySQL,赋予权限,密码自己改
mysql -u root -p
insert into user(Host,User,Password) values("localhost","hive","hive");
FLUSH PRIVILEGES;
GRANT ALL PRIVILEGES ON *.* TO 'hive'@'localhost' IDENTIFIED BY 'hive';
FLUSH PRIVILEGES;
4 解压hive安装包之后,配置环境
#hive
export HIVE_HOME=/home/sendi/apache-hive-1.1.1-bin
export PATH=$PATH:$HIVE_HOME/bin
5 修改hive/conf下的几个template模板
cp hive-env.sh.template hive-env.sh
cp hive-default.xml.template hive-site.xml
6 配置hive-env.sh文件,指定HADOOP_HOME
vim hive-env.sh
7 修改hive-site.xml文件,指定MySQL数据库驱动、数据库名、用户名及密码
vim hive-site.xml
8 修改hive/bin下的hive-config.sh文件,设置JAVA_HOME,HADOOP_HOME,HIVE_HOME
# Hadoop/Java Home --- 2016-12-23 hjr
export JAVA_HOME=/home/hjr/develop/java8/jdk1.8.0_101
export HADOOP_HOME=/home/hjr/develop/cluster/hadoop-2.6.5
export HIVE_HOME=/home/hjr/develop/apache-hive-1.2.1-bin
9 下载mysql-connector-java-5.1.40-bin.jar文件,并放到$HIVE_HOME/lib目录下
查看一下:
10 在HDFS中创建/tmp和/user/hive/warehouse并设置权限
hadoop fs -mkdir /tmp
hadoop fs -mkdir /user/hive/warehouse
hadoop fs -chmod g+w /tmp
hadoop fs -chmod g+w /user/hive/warehouse
11 启动hive
12 启动时,可能会遇到下面的问题
经典错误:Found class jline.Terminal
[ERROR] Terminal initialization failed; falling back to unsupported
java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected
at jline.TerminalFactory.create(TerminalFactory.java:101)
at jline.TerminalFactory.get(TerminalFactory.java:158)
at jline.console.ConsoleReader.<init>(ConsoleReader.java:229)
at jline.console.ConsoleReader.<init>(ConsoleReader.java:221)
at jline.console.ConsoleReader.<init>(ConsoleReader.java:209)
at org.apache.hadoop.hive.cli.CliDriver.setupConsoleReader(CliDriver.java:787)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:721)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Exception in thread "main" java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected
at jline.console.ConsoleReader.<init>(ConsoleReader.java:230)
at jline.console.ConsoleReader.<init>(ConsoleReader.java:221)
at jline.console.ConsoleReader.<init>(ConsoleReader.java:209)
at org.apache.hadoop.hive.cli.CliDriver.setupConsoleReader(CliDriver.java:787)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:721)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
13 Found class jline.Terminal:原因是hadoop目录下存在老版本jline
解决方法:
(1)进入hive的lib目录,把新版本的jline复制到hadoop的以下目录
hjr@hjr:~/develop/apache-hive-1.2.1-bin/lib$ cp jline-2.12.jar /home/hjr/develop/cluster/hadoop-2.6.5/share/hadoop/yarn/lib
(2)把hadoop旧版本的jline删掉
14 再次启动Hive成功
问题(1):
Fri Dec 23 19:13:35 CST 2016 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
警告问题是说:使用SSL连接数据库(高版本的MySQL数据库)……
修改hive的hive-site.xml配置文件,添加:&characterEncoding=UTF-8&useSSL=false,不使用SSL连接数据库(此处还设置了连接字符串的编码格式)
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8&useSSL=false</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
15 Hive测试案例
1 测试数据
1::F::1::10::48067
2::M::56::16::70072
3::M::25::15::55117
4::M::45::7::02460
5::M::25::20::55455
6::F::50::9::55117
7::M::35::1::06810
8::M::25::12::11413
9::M::25::17::61614
10::F::35::1::95370
11::F::25::1::04093
12::M::25::12::32793
13::M::45::1::93304
14::M::35::0::60126
15::M::25::7::22903
16::F::35::0::20670
17::M::50::1::95350
18::F::18::3::95825
19::M::1::10::48073
20::M::25::14::55113
21::M::18::16::99353
22::M::18::15::53706
23::M::35::0::90049
24::F::25::7::10023
25::M::18::4::01609
26::M::25::7::23112
27::M::25::11::19130
28::F::25::1::14607
29::M::35::7::33407
30::F::35::7::19143
31::M::56::7::06840
32::F::25::0::19355
33::M::45::3::55421
34::F::18::0::02135
35::M::45::1::02482
36::M::25::3::94123
37::F::25::9::66212
38::F::18::4::02215
39::M::18::4::61820
40::M::45::0::10543
41::F::18::4::15116
42::M::25::8::24502
43::M::25::12::60614
44::M::45::17::98052
45::F::45::16::94110
46::M::18::19::75602
47::M::18::4::94305
48::M::25::4::92107
49::M::18::12::77084
50::F::25::2::98133
51::F::1::10::10562
52::M::18::4::72212
53::M::25::0::96931
54::M::50::1::56723
55::F::35::12::55303
56::M::35::20::60440
57::M::18::19::30350
58::M::25::2::30303
59::F::50::1::55413
60::M::50::1::72118
61::M::25::17::95122
62::F::35::3::98105
63::M::18::4::54902
64::M::18::1::53706
65::M::35::12::55803
66::M::25::18::57706
67::F::50::5::60181
68::M::18::4::53706
69::F::25::1::02143
70::M::18::4::53703
71::M::25::14::95008
72::F::45::0::55122
73::M::18::4::53706
74::M::35::14::94530
75::F::1::10::01748
76::M::35::7::55413
77::M::18::4::15321
78::F::45::1::98029
79::F::45::0::98103
80::M::56::1::49327
81::F::25::0::60640
82::M::25::17::48380
83::F::25::2::94609
84::M::18::4::53140
85::M::18::4::94945
86::F::1::10::54467
87::M::25::14::48360
88::F::45::1::02476
89::F::56::9::85749
90::M::56::13::85749
创建用户表结构
CREATE TABLE users(UserID BigInt,Gender String,Age Int,Occupation String,Zipcode String) PARTITIONED BY (dt String) ROW FORMAT DELIMITED FIELDS TERMINATED BY '::';
load data inpath '/moviesDataSet/ml-1m/users.dat' into table users PARTITION(dt=20161223);
select count(1) from users;
附录
Name node is in safe mode.
Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create directory /tmp/hive/hjr/0cf97b04-c1d7-474b-9a0f-78dba3168259. Name node is in safe mode.
The reported blocks 104 has reached the threshold 0.9990 of total blocks 104. The number of live datanodes 1 has reached the minimum number 0. In safe mode extension. Safe mode will be turned off automatically in 12 seconds.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1366)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4258)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4233)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:853)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:600)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:975)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2036)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2034)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create directory /tmp/hive/hjr/0cf97b04-c1d7-474b-9a0f-78dba3168259. Name node is in safe mode.
The reported blocks 104 has reached the threshold 0.9990 of total blocks 104. The number of live datanodes 1 has reached the minimum number 0. In safe mode extension. Safe mode will be turned off automatically in 12 seconds.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1366)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4258)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4233)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:853)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:600)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:975)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2036)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2034)
at org.apache.hadoop.ipc.Client.call(Client.java:1470)
at org.apache.hadoop.ipc.Client.call(Client.java:1401)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy15.mkdirs(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:539)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy16.mkdirs(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2742)
at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2713)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:870)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:866)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:866)
at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:859)
at org.apache.hadoop.hive.ql.session.SessionState.createPath(SessionState.java:639)
at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:574)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
... 8 more
hadoop dfsadmin -safemode leave