2、hive导入、导出、删除

最新推荐文章于 2023-01-11 15:05:12 发布

转载最新推荐文章于 2023-01-11 15:05:12 发布 · 399 阅读

7 篇文章

订阅专栏

本文详细介绍了Hive中的四种常见数据导入方法，包括从本地文件系统、HDFS、其他表查询导入以及创建表时直接插入数据。此外，还介绍了三种数据导出方式：导出到本地文件系统、HDFS及Hive的另一个表中。

一、Hive的几种常见的数据导入方式

（插入过程中，不做源数据检测，即：什么数据都可以插入，只是查询时会报空值）

这里介绍四种：

（1）从本地文件系统中导入数据到Hive表；

（2）从HDFS上导入数据到Hive表；

（3）从别的表中查询出相应的数据并导入到Hive表中；

（4）在创建表的时候通过从别的表中查询出相应的记录并插入到所创建的表中

[hadoop@h91 hive-0.9.0-bin]$ bin/hive 进入hive模式

1.从本地文件系统中导入数据到Hive表

1.1：创建ha表

hive> create table ha(id int,name string)

> row format delimited --关键字，设置创建表的时候支持分隔符

> fields terminated by '\t' --关键字，定义分隔符类型

> stored as textfile; --关键字，设置加载数据的数据类型是文本文档（txt格式）

[ROW FORMAT DELIMITED]关键字，是用来设置创建的表在加载数据的时候，支持的列分隔符。

[STORED AS file_format]关键字是用来设置加载数据的数据类型,默认是TEXTFILE，如果文件数据是纯文本，就是使用 [STORED AS TEXTFILE]，然后从本地直接拷贝到HDFS上，hive直接可以识别数据。

1.2：操作系统中的文本

[hadoop@h91 ~]$ vim haha.txt

101 zs

102 ls

103 ww

1.3：导入数据

hive> load data local inpath '/home/hadoop/haha.txt' into table ha;

hive> select * from ha;

和我们熟悉的关系型数据库不一样，Hive是非关系型数据库，现在还不支持在insert语句里面直接给出一组记录的文字形式，也就是说，Hive并不支持INSERT INTO …. VALUES形式的语句。

2.从HDFS上导入数据到Hive表；

2.1：在文件系统中创建文件，并上传到hdfs集群中

[hadoop@h91 ~]$ hadoop fs -mkdir abc

[hadoop@h91 ~]$ vim hehe.txt

1001 aa

1002 bb

1003 cc

[hadoop@h91 ~]$ hadoop fs -put hehe.txt /user/hadoop/abc （上传到 hdfs中）

2.2：创建表

hive> create table he(id int,name string)

> row format delimited

> fields terminated by '\t'

> stored as textfile;

2.3：导入数据

hive> load data inpath 'hdfs://h101:9000/user/hadoop/abc/hehe.txt' into table he;

3.从别的表中查询出相应的数据并导入到Hive表中

3.1：查询原始表格，并创建新表格

hive> select * from he;

OK

1001 aa

1002 bb

1003 cc

hive> create table heihei(id int,name string)

> row format delimited

> fields terminated by '\t'

> stored as textfile;

3.2：把原始表中的数据查询并插入到新表格中

hive> insert into table heihei select * from he; --0.9之后的版本可以使用into

或

hive> insert overwrite table heihei select * from ha; --0.9之前的版本不能使用into，只能使用overwrite（insert overwrite 会覆盖数据）

4.在创建表的时候通过从别的表中查询出相应的记录并插入到所创建的表中

hive> create table gaga as select * from he;

==============================

二、导出数据

（1）导出到本地文件系统；

（2）导出到HDFS中；

（3）导出到Hive的另一个表中。

1.导出到本地文件系统：

hive> insert overwrite local directory '/home/hadoop/he1' select * from he;

[hadoop@h91 ~]$ cd he1（he1为目录，目录下有000000_0文件 ）

[hadoop@h91 he1]$ cat 000000_0

（发现 列之间没有分割 ）

可以下面的方式增加分割

hive> insert overwrite local directory '/home/hadoop/he1' select id,concat('\t',name) from he;

和导入数据到Hive不一样，不能用insert into来将数据导出

2.导出到HDFS中。

hive> insert overwrite directory '/user/hadoop/abc' select * from he;

（/user/hadoop/abc 为hdfs下目录）

[hadoop@h91 hadoop-0.20.2-cdh3u5]$ bin/hadoop fs -ls abc

[hadoop@h91 hadoop-0.20.2-cdh3u5]$ bin/hadoop fs -cat abc/000000_0

3.导出到Hive的另一个表中

hive> insert into table he12 select * from he;

==============================

三、删除数据

1.在hadoop集群中删除文件即可实现删除表中数据

既然已经知道导入数据是通过文件导入，或者是表格导入，只需要在hadoop集群中删除相应的导入的数据文件即可

[hadoop@h101 ~]$ hadoop fs -lsr /user/hive/warehouse

drwxr-xr-x - hadoop supergroup 0 2017-08-23 18:42 /user/hive/warehouse/two

-rw-r--r-- 2 hadoop supergroup 27 2017-08-23 18:39 /user/hive/warehouse/two/000000_0

-rw-r--r-- 2 hadoop supergroup 27 2017-08-23 18:40 /user/hive/warehouse/two/a.txt

-rw-r--r-- 2 hadoop supergroup 27 2017-08-23 18:42 /user/hive/warehouse/two/b.txt