hive数据加载

最新推荐文章于 2024-06-04 22:45:46 发布

原创最新推荐文章于 2024-06-04 22:45:46 发布 · 3k 阅读

1 ·

CC 4.0 BY-SA版权

文章标签：

#hive #数据导入 #数据

hive 专栏收录该内容

4 篇文章

订阅专栏

本文深入探讨了Hive中的数据操作和加载技术，包括行级别操作限制、数据覆盖与追加、本地文件系统与HDFS交互、分隔符一致性、字段类型转换、查询与插入数据规则、动态分区插入及CTAS数据导入方法。详细解析了如何高效地管理和操作Hive表中的数据。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

一.需要注意的问题：

1.hive不支持行级别的增删改

2.使用overwrite会覆盖表的原有数据,into则是追加。

3.local会将本地文件系统复制一份再上传至指定目录，无local只是将本地文件系统上的数据移动到指定目录。

4.若目录指向hdfs上的数据则执行的是move操作。

5.分隔符要与数据文件中的分隔符一样，且分隔符默认只有一个(如:'\t\n')。

6.load数据时，字段类型不能互相转化时，查询返回Null。

7.select查询插入，字段类型不能相互转化时，插入数据为NULL。

8.select查询插入数据，字段值顺序要与表中字段顺序一致，名称可以不一致(hive在加载数据时不做检查，查询时检查).

9.直接移动数据文件到含有分区表的存放目录下时，数据存放的路径层次也和表的分区一致，若表中没有添加相应的分区对应数据存放路径，即使目标路径下有数据也依然会查不到。

二.load data语句装载数据

load data导入数据格式。

1. load data inpath '/user/hadoop/emp.txt'into/overwrite table table_name;

2. load data local inpath'/user/hadoop/emp.txt' into/overwrite table table_name;

3. load data local inpath'/user/hadoop/emp.txt' into/overwrite table table_namepartition(part="a");

例. hive> create table emp(
    > id int,
    > name string,
    > job string,
    > salary int
    > )
    > partitioned by (city string,dt string)
    > row format delimited
    > fields terminated by '\t'
    > lines terminated by '\n'
    > stored as textfile;

为了方便操作下文的hive表emp1、emp2均是通过create table table_name like source_table;创建。

三.通过查询语句向表中插入数据

1.insert into/overwrite table table_nameselect *from source_table;

2.insert into/overwrite table table_namepartition(part="a") select id,name from source_table;

例. hive>insertinto table emp1 partition(city='shanghai',dt='2016-5-9')

>selectid,name,job,salary from emp;

在使用此语式时table_name的表结构字段个数，必须与select后查询字段个数对应。

如无分区时可使用CTAS方式加载数据更灵活（下面会介绍）。

3.hive还可以一次查询产生多个不相交的输出。（降低源表的扫描次数）。

    from source_table
    insert into/overwrite table table_name partition(part="a")
    select id,name,age where id>0 and id<20
    insert into/overwrite table table_name partition(part="a")
    select id,name,age where id>100 and id<120
    insert into/overwrite table table_name partition(part="a")
    select id,name,age where id>150 and id<200;

例.