hive笔记（与上一偏《hadoop集群搭建》结合）

最新推荐文章于 2025-05-27 13:58:26 发布

原创最新推荐文章于 2025-05-27 13:58:26 发布 · 166 阅读

CC 4.0 BY-SA版权

本文详细介绍了如何在Hive中进行数据操作，包括创建表、加载数据、使用分区表及通过Sqoop导入MySQL数据等过程。并通过具体实例展示了不同场景下的操作步骤。

1.上传hive安装包

2.解压

2.1

	create table trade_detail (id bigint, account string, income double, expenses double, time string);

2.2

	创建文件: /root/trade_detail.txt

	1	Jason@ahome.com	3000000000.0	0.0	2017-10-01

	2	Tony@ahome.com	3000000000.0	0.0	2017-10-01

	3	Scott@ahome.com	3000000000.0	0.0	2017-10-01

2.3

	load data local inpath '/root/trade_detail.txt' into table trade_detail;

	2.4 指定分隔符

		create table teacher(id bigint, account string, income double, expenses double, time string) row format delimited fields terminated by '\t';

		load data local inpath '/root/trade_detail.txt' into table teacher;

	2.5. 目录下所有文件创建表(使用 hdfs 上面的文件夹创建表)

		2.5.1 先把文件放到 hdfs 上面

			hive> dfs -put /root/student.txt /data/a.txt;

			hive> dfs -put /root/student.txt /data/b.txt;

		2.5.2 创建表并指向 hdfs 的目录 /data 

			hive> create table bd_teacher(id bigint, account string, income double, expenses double, time string) row format delimited fields terminated by '\t' location '/data';

	2.6 在 hive 的命令控制台可以执行 hadoop shell 命令: 

		hive> dfs -ls /;

		hive> dfs -mkdir /data;

		hive> dfs -put /root/student.txt /data/a.txt;

		hive> dfs -put /root/student.txt /data/b.txt;

		hive> create external table bd_student(id bigint, account string, income double, expenses double, time string) row format delimited fields terminated by '\t' location '/data';

		hive> select * from bd_student;

OK

		1	Jason@ahome.com	3.0E9	0.0	2017-10-01

		2	Tony@ahome.com	3.0E9	0.0	2017-10-01

		3	Scott@ahome.com	3.0E9	0.0	2017-10-01

		1	Jason@ahome.com	3.0E9	0.0	2017-10-01

		2	Tony@ahome.com	3.0E9	0.0	2017-10-01

		3	Scott@ahome.com	3.0E9	0.0	2017-10-01

		Time taken: 0.331 seconds, Fetched: 6 row(s)

		hive> dfs -put /root/student.txt /data/c.txt;

		hive> select * from bd_student;

OK

		1	Jason@ahome.com	3.0E9	0.0	2017-10-01

		2	Tony@ahome.com	3.0E9	0.0	2017-10-01

		3	Scott@ahome.com	3.0E9	0.0	2017-10-01

		1	Jason@ahome.com	3.0E9	0.0	2017-10-01

		2	Tony@ahome.com	3.0E9	0.0	2017-10-01

		3	Scott@ahome.com	3.0E9	0.0	2017-10-01

		1	Jason@ahome.com	3.0E9	0.0	2017-10-01

		2	Tony@ahome.com	3.0E9	0.0	2017-10-01

		3	Scott@ahome.com	3.0E9	0.0	2017-10-01

		Time taken: 0.331 seconds, Fetched: 9 row(s)

		即把数据放到目录，hive 就会把表的数据指向到这个目录下的文件进行查询

	2.7 分区表

		hive> create external table beauties (id bigint, name string, size double) partitioned by (nation string) row format delimited fields terminated by '\t' location '/beauty';

		cat /root/beauty.txt

		1	jingtian	35

		2	bingbing	35

		cat /root/beauty2.txt

		5	bdyjy	32

		6	jzmb	35

		cat /root/beauty3.txt

		8	zl	1

		// 直接hadoop上传的文件，分区表查询不到数据

		hadoop fs -put beauty.txt /beauty/b.c

		hive> select * from beauties;

OK

		Time taken: 0.424 seconds

		// 使用 hive load 

		hive> hive> load data local inpath '/root/beauty.txt' into table beauties partition (nation='China');

		hive> select * from beauties;

OK

		1	jingtian	35.0	China

		2	bingbing	35.0	China

		Time taken: 0.125 seconds, Fetched: 2 row(s)

		// 添加其他分区

		alter table beauties add partition (nation='Japan');

		// 添加数据

		dfs -put /root/beauty2.txt /beauty/nation=Japan;

	2.8 分区表2

		create table sms (id bigint, connect string, area string) partitioned by (area string) row format delimited fields terminated by '\t';

		创建不成功，表字段不能作为表分区的字段(只能改原码)

	2.9 利用sqoop 导入mysql表数据到 hive: 

		创建 hive 表: 

			create table trade_detail (id bigint, account string, income double, expenses double, time string) row format delimited fields terminated by '\t';

			create table user_info (id bigint, account string, name string, age int) row format delimited fields terminated by '\t';

		导入数据到 hive 库: 

			sqoop import --connect jdbc:mysql://hadoop202:3306/hadoop --username root --password root --table trade_detail --hive-import --hive-overwrite --hive-database userdb --hive-table trade_detail --fields-terminated-by '\t'

			sqoop import --connect jdbc:mysql://hadoop202:3306/hadoop --username root --password root --table user_info --hive-import --hive-overwrite --hive-database userdb --hive-table user_info --fields-terminated-by '\t'

		表连接语句: 

			select t.account, u.name, t.income, t.expenses, t.surplus from user_info u join (select account, sum(income) as income, sum(expenses) as expenses, sum(income-expenses) as surplus from trade_detail group by account) t on u.account = t.account;

3.配置

	3.1安装mysql 

		查询以前安装的mysql相关包

		rpm -qa | grep mysql

		暴力删除这个包

		rpm -e mysql-libs-5.1.66-2.el6_3.i686 --nodeps

		rpm -ivh MySQL-server-5.1.73-1.glibc23.i386.rpm 

		rpm -ivh MySQL-client-5.1.73-1.glibc23.i386.rpm

		执行命令设置mysql

		/usr/bin/mysql_secure_installation

		将hive添加到环境变量当中

		GRANT ALL PRIVILEGES ON hive.* TO 'root'@'%' IDENTIFIED BY '123' WITH GRANT OPTION;

		FLUSH PRIVILEGES

		在hive当中创建两张表

		create table trade_detail (id bigint, account string, income double, expenses double, time string) row format delimited fields terminated by '\t';

		create table user_info (id bigint, account string, name  string, age int) row format delimited fields terminated by '\t';

		将mysq当中的数据直接导入到hive当中

		sqoop import --connect jdbc:mysql://192.168.1.10:3306/itcast --username root --password 123 --table trade_detail --hive-import --hive-overwrite --hive-table trade_detail --fields-terminated-by '\t'

		sqoop import --connect jdbc:mysql://192.168.1.10:3306/itcast --username root --password 123 --table user_info --hive-import --hive-overwrite --hive-table user_info --fields-terminated-by '\t'

		创建一个result表保存前一个sql执行的结果

		create table result row format delimited fields terminated by '\t' as select t2.account, t2.name, t1.income, t1.expenses, t1.surplus from user_info t2 join (select account, sum(income) as income, sum(expenses) as expenses, sum(income-expenses) as surplus from trade_detail group by account) t1 on (t1.account = t2.account);

		create table user (id int, name string) row format delimited fields terminated by '\t'

		将本地文件系统上的数据导入到HIVE当中

		load data local inpath '/root/user.txt' into table user;

		创建外部表

		create external table stubak (id int, name string) row format delimited fields terminated by '\t' location '/stubak';

		创建分区表

		普通表和分区表区别：有大量数据增加的需要建分区表

		create table book (id bigint, name string) partitioned by (pubdate string) row format delimited fields terminated by '\t'; 

		分区表加载数据

		load data local inpath './book.txt' overwrite into table book partition (pubdate='2010-08-22');

hive笔记（与上一偏 《hadoop集群搭建》结合）

hive笔记（与上一偏《hadoop集群搭建》结合）