相同数据,分别以TextFile、SequenceFile、RcFile、ORC存储的比较。
原始大小: 19M
1. TextFile(默认) 文件大小为18.1M
2. SequenceFile
1 2 3 4 5 6 7 8 9 10 11 12 | create table page_views_seq( track_time string, url string, session_id string, referer string, ip string, end_user_id string, city_id string )ROW FORMAT DELIMITED FIELDS TERMINATED BY “\t” STORED AS SEQUENCEFILE;
insert into table page_views_seq select * from page_views; |
用SequenceFile存储后的文件为19.6M
3. RcFile
1 2 3 4 5 6 7 8 9 10 11 12 | create table page_views_rcfile( track_time string, url string, session_id string, referer string, ip string, end_user_id string, city_id string )ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t" STORED AS RCFILE;
insert into table page_views_rcfile select * from page_views; |
用RcFile存储后的文件为17.9M
4. ORCFile
1 2 3 4 5 | create table page_views_orc ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t" STORED AS ORC TBLPROPERTIES("orc.compress"="NONE") as select * from page_views; |
用ORCFile存储后的文件为7.7M
5. Parquet
create table page_views_parquet
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
STORED AS PARQUET
as select * from page_views;
用ORCFile存储后的文件为13.1M
总结:磁盘空间占用大小比较
ORCFile(7.7M)<parquet(13.1M)<RcFile(17.9M)<Textfile(18.1M)<SequenceFile(19.6)