将csv 文件导入到impala 和 kudu中
具体实现方法:
一、对于impala表
前提:
因为impala不支持导入本地文件(和hive有点区别),所以要先上传到hdfs上,再加载数据(impala 里面,local是保留字)
hadoop fs -mkdir -p /tmp/csv
hadoop fs -put /home/youjun/impala.csv /tmp/csv
hadoop fs -chmod 777 /tmp/csv
impala 不分区表
DROP TABLE IF EXISTS test.csv_impala;
CREATE TABLE
test.csv_impala
(id int
,name string
,age int
)
row format delimited
fields terminated by ','
;
load data inpath '/tmp/csv' into table test.csv_impala ;
impala 分区表
限制条件:表要先有分区才能加载数据到这个分区上,不能加载数据到没有建立的分区上。
DROP TABLE IF EXISTS test.csv_impala;
CREATE TABLE
test.csv_impala
(id int
,name string
)
PARTITIONED BY (age int)
row format delimited
fields terminated by ','
;
load data inpath '/tmp/csv' into table test.csv_impala partition(age = 24) ;
二、对于kudu 表
限制:
不能为kudu表制定行格式,load data 也只能支持hdfs 上的表,所以考虑使用中间表
-
使用impala-shell 建一张临时表、一张impala表和一张kudu表
DROP TABLE IF EXISTS test.csv_tmp; CREATE TABLE test.csv_tmp (id int ,name string ,age int ) row format delimited fields terminated by ',' ; DROP TABLE IF EXISTS test.csv_kudu; CREATE TABLE test.csv_kudu (id int ,name string ,age int ,PRIMARY KEY (id) ) PARTITION BY HASH(id) PARTITIONS 16 STORED AS kudu ;
-
加载csv文件,因为impala不支持导入本地文件(和hive有点区别),所以要先上传到hdfs上,再加载数据
(impala 里面,local是保留字)
hadoop fs -mkdir -p /tmp/csv hadoop fs -put /home/youjun/impala.csv /tmp/csv hadoop fs -chmod 777 /tmp/csv
加载数据
load data inpath '/tmp/csv' into table test.csv_tmp;
-
向kudu表里面插入数据
1) 只是增量的插入数据
INSERT INTO test.csv_kudu SELECT id ,name ,age FROM test.csv_tmp ;
2)更新数据
UPDATE test.csv_kudu SET test.csv_kudu.name = tb.name ,test.csv_kudu.age = tb.age FROM test.csv_kudu ,test.csv_tmp AS tb WHERE test.csv_kudu.id = tb.id ;