hive---DML一些学习

最新推荐文章于 2024-08-31 13:54:45 发布

chen_1122

最新推荐文章于 2024-08-31 13:54:45 发布

阅读量324

点赞数 1

CC 4.0 BY-SA版权

分类专栏：大数据文章标签： hive

本文链接：https://blog.youkuaiyun.com/chen_1122/article/details/77991282

大数据专栏收录该内容

2 篇文章

订阅专栏

本文介绍了五种在Hive中加载或导入数据的方法，包括直接加载文件数据、使用CTAS语句、创建表并指定路径、使用INSERT语句以及利用EXPORT和IMPORT功能。每种方法都详细解释了具体操作步骤及注意事项。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1.加载LOAD文件数据到hive表里面（先在hive里面创建与要导入数据字段相同的表，然后再将数据加载进去）：

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename ;（无论从Linux还是hdfs上加载都不走mr）

LOCAL：带local-------->表示从Linux文件系统中加载

不带local---->表示从HDFS文件系统中加载

filepath：local的话就写Linux的路径，不是local就写hdfs的路径

overwrite：有overwrite-------->表示覆盖

没有overwrite------>表示在原来表数据的基础上追加

例子：LOAD DATA LOCAL INPATH '/home/hadoop/class.txt' OVERWRITE INTO TABLE class ;(从Linux，将某个Linux的目录下的数据文件加载到hive表后，Linux的文件还是存在的，只是被复制一份到hive表所在路径下)

LOAD DATA INPATH '/ruoze/class.txt' OVERWRITE INTO TABLE class ;（从hdfs，在hdfs中，将某个目录下的数据加载到hive表中后，该目录的数据文件会被移动到hive表对应的路径下）

2.CTAS create table tablename as select ...... 使用查询语句，将数据和表结构导入到新建的表中，事先不需要创建好表，创建表的同时往里面加载数据（要走mr）

例子：create table t1 as select * from student;(student表的全部字段复制到t1中)

create table t2 as select name,age from student;(student表的部分字段复制到t1中)

3.创建表的时候指定路径，然后将数据文件直接上传到该表所在的HDFS的路径下面即可：

CREATE TABLE [IF NOT EXISTS] [db_name.]table_name [ROW FORMAT row_format] [LOCATION hdfs_path]

例子： create table if not exists student7(name string,age int,class string) row format delimited fields terminated by '\t' location '/chenping/test';先在指定目录下创建一个表

hadoop fs -put class.txt /chenping/test/ 然后将数据文件直接拷贝至HDFS的/chenping/test目录下，然后查看student7的数据即可查看到数据

4.插入INSERT使用

1）insert到单表中：Inserting data into Hive Tables from queries 将查询其他表的数据inert到hive的另外一张表里面去，事先在hive中创建好与数据相对应的表结构（要走mr）

INSERT OVERWRITE TABLE tablename1 select_statement1 FROM from_statement;（overwrite覆盖）

INSERT INTO TABLE tablename1 select_statement1 FROM from_statement;（into追加）

例子： insert overwrite table student4 select * from student;

insert into table student4 select * from student;

insert到多表中：FROM from_statement 将一个表中的数据插入到其他多表中（要走mr）

INSERT OVERWRITE/INTO TABLE tablename1 select_statement1

INSERT OVERWRITE/INTO TABLE tablename2 select_statement2

INSERT OVERWRITE/INTO TABLE tablename2 select_statement2 ...;

例子： from student 全字段插入

insert into table student5 select *

insert into table student6 select *;

例子： from student 部分字段插入

insert into table student6 select *

insert into table student_name select name

insert into table student_name_id select name,age;

2）将查询其他表的数据insert到文件系统中：Writing data into the filesystem from queries 将通过sql语句查询到的结果导出到文件系统中

INSERT OVERWRITE [LOCAL] DIRECTORY directory1 [ROW FORMAT row_format] SELECT ... FROM ... (要走mr)

LOCAL：有local表示导出到Linux文件系统中，无local表示导入到HDFS上

例子：

insert overwrite directory '/chenping/testhive' row format delimited fields terminated by '/t' select name,age from student;(导出到hdfs文件系统上，路径也要注意)

5.export导出/inport导入的使用（不走mr）

export导出，将整个表的数据以及元数据导出到HDFS文件中 EXPORT TABLE tablename TO 'export_target_path' ；

例子：export table student to '/test';

查看结果：[hadoop@hadoop000 data]$ hadoop fs -lsr /test （发现/test目录下有元数据以及数据）

-rwxr-xr-x 1 hadoop supergroup 1295 2017-09-15 12:43 /test/_metadata

drwxr-xr-x - hadoop supergroup 0 2017-09-15 12:43 /test/data

-rwxr-xr-x 1 hadoop supergroup 71 2017-09-15 12:43 /test/data/student.txt

import导入，将导出的文件导入到hive表里 IMPORT [[EXTERNAL] TABLE new_or_original_tablename FROM 'source_path';

例子：import table student1 from '/test'; （事先不用创建student1表）