2. hive安装过程
1) mysql安装软件
判断我们上一次执行命令是否成功
echo $?
如果结果为:0表示命令执行成功,其他的结果我们可以百度一下
卸载 mariadb
http://www.cr173.com/html/50473_1.html
rpm -ivh MySQL-*-5.5.46-1.el6.x86_64.rpm
解决的网站:http://www.2cto.com/database/201501/371451.html
在启动的过程用 : sudo (先试)
然后再启动mysql就没问题了:
[jiangzl@master tmp]$ service mysql start
Starting MySQL. ERROR! The server quit without updating PID file (/var/lib/mysql/master.pid).
[jiangzl@master tmp]$ sudo service mysql start (管理员身份运行)
[sudo] password for jiangzl:
Starting MySQL.. SUCCESS!
(如果sudo 无法启动,请参考红色部分 ,否则就忽略就行)
[jiangzl@master package]$ service mysql start
Starting MySQL. ERROR! The server quit without updating PID file (/var/lib/mysql/master.pid).
我们这里主要是因为:selinux惹的祸,如果是centos系统,默认会开启selinux。解决方法是关闭它,打开/etc/selinux/config,把SELINUX=enforcing改为SELINUX=disabled后存盘退出重启机器。
[jiangzl@master ~]$ service mysql start
Starting MySQL SUCCESS!
查看MySQL运行状态:
[jiangzl@master ~]$ service mysql status
SUCCESS! MySQL running (1421)
设置密码并登陆
[jiangzl@master ~]$ mysqladmin -u root password 'root'
[jiangzl@master ~]$ mysql -uroot -p
[jiangzl@master ~]$ sudo service mysql stop
[sudo] password for jiangzl:
Shutting down MySQL. SUCCESS!
[jiangzl@master ~]$
2) hive安装
// 这个是jdk的版本问题( 用 jdk1.7就行)
[jiangzl@master tmp]$ hive
Exception in thread "main" java.lang.UnsupportedClassVersionError: org/apache/hadoop/hive/cli/CliDriver : Unsupported major.minor version 51.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:295)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:249)
at org.apache.hadoop.util.RunJar.run(RunJar.java:214)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
// jline的 jar包问题
[jiangzl@master tmp]$ hive
Logging initialized using configuration in jar:file:/home/jiangzl/work/hive/lib/hive-common-1.2.1.jar!/hive-log4j.properties
[ERROR] Terminal initialization failed; falling back to unsupported
java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected
at jline.TerminalFactory.create(TerminalFactory.java:101)
at jline.TerminalFactory.get(TerminalFactory.java:158)
at jline.console.ConsoleReader.<init>(ConsoleReader.java:229)
at jline.console.ConsoleReader.<init>(ConsoleReader.java:221)
at jline.console.ConsoleReader.<init>(ConsoleReader.java:209)
at org.apache.hadoop.hive.cli.CliDriver.setupConsoleReader(CliDriver.java:787)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:721)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
解决办法:
将hive下的新版本jline的JAR包拷贝到hadoop下
hive/lib/jline-2.12.jar 拷贝到 hadoop/share/hadoop/yarn/lib/
然后对 hadoop下得jline-0.9.94.jar 重命名一下: jline-0.9.94.jar.bak
3) hive的操作(Derby)
[jiangzl@master tmp]$vim /home/jiangzl/student.txt
1 小强
2 小红
3 小明
[jiangzl@master tmp]$ hive
Logging initialized using configuration in jar:file:/home/jiangzl/work/hive/lib/hive-common-1.2.1.jar!/hive-log4j.properties
hive> show databases;
OK
default
Time taken: 0.965 seconds, Fetched: 1 row(s)
hive> show tables;
OK
Time taken: 0.056 seconds
hive> create table table_1 (id int, name string);
OK
Time taken: 0.449 seconds
hive> load data local inpath '/home/jiangzl/student.txt' into table table_1;
Loading data to table default.table_1
Table default.table_1 stats: [numFiles=1, totalSize=27]
OK
Time taken: 1.151 seconds
hive> select * from table_1;
OK
NULL NULL
NULL NULL
NULL NULL
Time taken: 0.273 seconds, Fetched: 3 row(s)
hive> create table table_2 (id int, name string) row format delimited fields terminated by '\t' stored as textfile;
OK
Time taken: 0.062 seconds
hive> load data local inpath '/home/jiangzl/student.txt' into table table_2;
Loading data to table default.table_2
Table default.table_2 stats: [numFiles=1, totalSize=27]
OK
Time taken: 0.259 seconds
hive> select * from table_2;
OK
1 小强
2 小红
3 小明
Time taken: 0.062 seconds, Fetched: 3 row(s)
4) Hive 连接mysql
1. 添加mysql元数据库
create database hive;
### GRANT all ON 数据库.* TO linux账号@登陆主机 IDENTIFIED BY 'mysql用户'; 登陆主机如果为 ‘%’表示所有的主机
GRANT all ON hive.* TO jiangzl@master IDENTIFIED BY 'root';
GRANT all ON hive.* TO root@master IDENTIFIED BY 'root';
flush privileges;
set global binlog_format='MIXED';
### 修改数据的格式
alter database hive character set latin1;
2. 修改$HIVE_HOME/bin的hive-config.sh,增加以下三行:
export JAVA_HOME=/home/jiangzl/work/jdk
export HIVE_HOME=/home/jiangzl/work/hive
export HADOOP_HOME=/home/jiangzl/work/hadoop
3. 修改conf目录下的模板文件
cd $HIVE_HOME/conf
mv hive-env.sh.template hive-env.sh
mv hive-default.xml.template hive-site.xml
# hive 连接 mysql
将 mysql-connector-java-5.0.4-bin.jar 复制到 hive/lib目录下
4. 修改hive-site.xml内容
查看hive的目录位置:hive.metastore.warehouse.dir
vi 操作:
查找指令:
:set nu 显示行号
shift + g 最后一行
n + g 跳转 第n行
行 首 :shift + ^
行 尾 :shift + $
/xxx 往下查找
?xxx 往上
n 下一个
shift + n 上一个
:set hls 打开高亮
:set nohls 关闭高亮
u表示 撤销
control+r 表示回退
// hive数据仓库默认的位置
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
配置 mysql机器:
<property>
<name>javax.jdo.option.ConnectionURL</name>
<!-- value>jdbc:derby:;databaseName=metastore_db;create=true</value -->
<value>jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
配置驱动
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<!-- value>org.apache.derby.jdbc.EmbeddedDriver</value -->
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
配置用户名
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<!-- value>APP</value -->
<value>root</value>
<description>Username to use against metastore database</description>
</property>
配置密码
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<!-- value>mine</value -->
<value>root</value>
<description>password to use against metastore database</description>
</property>
5) Hive 三种接口方式:
hive验证启动的方法
1、hive web界面的(端口号9999) 启动方式
hive 1.2.1没有 hive-hwi-*.war 这个需要后期打包,不是重点
hive --service hwi &
用于通过浏览器来访问hive
http://master:9999/hwi/
2、hive 远程服务(端口号10000) 启动方式
hive --service hiveserver &
补充: 在windows上使用eclipse jdbc连接hive
bin/hive --service hiveserver 不指定端口号,默认端口号为10000
http://www.cnblogs.com/ZJUKasuosuo/archive/2012/07/03/2574433.html
代码: 见 课程内容/hive 目录
3、hive命令行
1.启动报错
# 错误1
# 解决办法:GRANT all ON hive.* TO root@master IDENTIFIED BY 'root';
... 48 more
Caused by: java.sql.SQLException: Access denied for user 'root'@'master' (using password: YES)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:946)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2870)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:812)
at com.mysql.jdbc.MysqlIO.secureAuth411(MysqlIO.java:3269)
at com.mysql.jdbc.MysqlIO.doHandshake(MysqlIO.java:1182)
at com.mysql.jdbc.Connection.createNewIO(Connection.java:2670)
at com.mysql.jdbc.Connection.<init>(Connection.java:1531)
at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:266)
at java.sql.DriverManager.getConnection(DriverManager.java:571)
at java.sql.DriverManager.getConnection(DriverManager.java:187)
at com.jolbox.bonecp.BoneCP.obtainRawInternalConnection(BoneCP.java:361)
at com.jolbox.bonecp.BoneCP.<init>(BoneCP.java:416)
... 60 more
# 错误2
[jiangzl@master tmp]$ hive
Logging initialized using configuration in jar:file:/home/jiangzl/work/hive/lib/hive-common-1.2.1.jar!/hive-log4j.properties
Exception in thread "main" java.lang.RuntimeException: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
at org.apache.hadoop.fs.Path.initialize(Path.java:206)
at org.apache.hadoop.fs.Path.<init>(Path.java:172)
at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:563)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
... 8 more
Caused by: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
at java.net.URI.checkPath(URI.java:1804)
at java.net.URI.<init>(URI.java:752)
at org.apache.hadoop.fs.Path.initialize(Path.java:203)
... 11 more
解决方案如下:
1.查看hive-site.xml配置,会看到配置值含有"system:java.io.tmpdir"的配置项
2.新建文件夹/home/jiangzl/work/hive/io_tmp
3.将含有"system:java.io.tmpdir"的配置项的值修改为如上地址
<property>
<name>hive.exec.local.scratchdir</name>
<!-- value>${system:java.io.tmpdir}/${system:user.name}</value -->
<value>/home/jiangzl/work/hive/io_tmp/jiangzl</value>
<description>Local scratch space for Hive jobs</description>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<!-- value>${system:java.io.tmpdir}/${hive.session.id}_resources</value -->
<value>/home/jiangzl/work/hive/io_tmp/${hive.session.id}_resources</value>
<description>Temporary local directory for added resources in the remote file system.</description>
</property>
<property>
<name>hive.querylog.location</name>
<!-- value>${system:java.io.tmpdir}/${system:user.name}</value -->
<value>/home/jiangzl/work/hive/io_tmp/jiangzl</value>
<description>Location of Hive run time structured log file</description>
</property>
<property>
<name>hive.server2.logging.operation.log.location</name>
<!-- value>${system:java.io.tmpdir}/${system:user.name}/operation_logs</value -->
<value>/home/jiangzl/work/hive/io_tmp/jiangzl/operation_logs</value>
<description>Top level directory where operation logs are stored if logging functionality is enabled</description>
</property>
3. 启动hive,成功
2. Hive 的CLI调用方式
方式1:hive –f /root/shell/hive-script.sql(适合多语句)
hive-script.sql类似于script一样,直接写查询命令就行
例如:
[root@cloud4 shell]# vi hive_script3.sql
select * from t1;
select count(*) from t1;
不进入交互模式,执行一个hive script
这里可以和静音模式-S联合使用,通过第三方程序调用,第三方程序通过hive的标准输出获取结果集。
$HIVE_HOME/bin/hive -S -f /home/my/hive-script.sql (不会显示mapreduct的操作过程)
方式2:hive -e 'hql语句'(适合短语句,导出一个文件)
直接执行sql语句
例如:
[root@cloud4 shell]# hive -e 'select * from t1'
静音模式:
[root@cloud4 shell]# hive -S -e 'select * from t1' (用法与第一种方式的静音模式一样,不会显示mapreduce的操作过程)
此处还有一亮点,用于导出数据到linux本地目录下
例如:
[root@cloud4 shell]# hive -e 'select * from t1' > test.txt
有点类似pig导出分析结果一样,都挺方便的
方式3:数据导出方式(hive –e/f以外,导出多个文件)
方法有如下2种:
1.insert overwrite local directory (设置:自动平均 分为 多个包)
将结果输出到指定的目录:
生成的文件数 和redurcer的数目的一样的
在hive下面执行
set mapred.reduce.tasks=8; #用8个reduce,自动分成8个包
insert overwrite local directory '/hive_dat/package_name'
select package_name,count(1) from app_list group by package_name;
2.直接查询生成结果文件:
在linux下面执行:(也支持后台运行 如果执行时间比较长的话)
hive -S -e"select package_name,count(1) from app_list group by package_name;"> grp_app_id.dat
这个只生成一个文件 并且输出的文件里面的列 是以空格隔开的
方式4:hive (直接使用hive交互式模式)
介绍一种有意思的用法:sql的语法
hive>quit; 退出hive
hive> show databases; 查看数据库
hive> create database test; 创建数据库
hive> use default; 使用哪个数据库
hive>create table t1 (key string); 创建表
对于创建表我们可以选择读取文件字段按照什么字符进行分割
例如:
hive>create table t1(id ) '/wlan'
partitioned by (log_date string) 表示通过log_date进行分区
row format delimited fields terminated by '\t' 表示代表用‘\t’进行分割来读取字段
stored as textfile/sequencefile/rcfile/; 表示文件的存储的格式
公益hive视频资料分享:
链接: http://pan.baidu.com/s/1jGKZKSe 密码: 1zty
hive课程中- hive视频-HIVE(2)基本操作.avi