简述:hive load数据时,文件的命名不要以_开头。
1. 先看下要导入的文件,\t 分割。
cat /tmp/_load.csv
1 aaa ok
2 aaa error
3 aaa ok
4 bbb ok
5 ccc error
6 ccc ok
7 ddd error
2. 建表,指定分隔符(默认/u0001)
hive> create table tmp_wjk_having_test (id int, type string, status string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' ;
OK
Time taken: 2.938 seconds
hive> load data local inpath '/tmp/_load.csv' overwrite into table tmp_wjk_having_test;
Copying data from file:/tmp/_load.csv
Copying file: file:/tmp/_load.csv
Loading data to table default.tmp_wjk_having_test
Moved to trash: hdfs://hd00:9000/user/hive/warehouse/tmp_wjk_having_test
OK
Time taken: 0.5 seconds
3. 蛋疼的时刻来了,select下吧
hive> select * from tmp_wjk_having_test;
OK
Time taken: 0.794 seconds
4.查询没数据,但是hdfs上是有数据的。各种核对都没发现问题。换了分隔符再试也是没数据。
[work(0)@dm02 12:56:41 ~]$ hadoop fs -cat /user/hive/warehouse/tmp_wjk_having_test/_load.csv
1 aaa ok
2 aaa error
3 aaa ok
4 bbb ok
5 ccc error
6 ccc ok
7 ddd error
5. 换个文件名 mv /tmp/_load.csv /tmp/load.csv, 再load下,天就亮了。
load data local inpath '/tmp/load.csv' overwrite into table tmp_wjk_having_test;
hive> select * from test_hive_having;
OK
1 aaa ok
2 aaa error
3 aaa ok
4 bbb ok
5 ccc error
6 ccc ok
7 ddd error