场景:hive常见的格式有TextFile,SequenceFile,ORC,Parquet,RCFile等
示例:分别建5张表,为以上5种存储格式,并分别往里写入1000万条数据,查看其各自占用的存储空间
create table test1 (
id string
,name string
)
row format delimited
fields terminated by '|'
lines terminated by '\n'
stored as textfile
;
create table test2 (
id string
,name string
)
row format delimited
fields terminated by '|'
lines terminated by '\n'
stored as SequenceFile
;
create table test3 (
id string
,name string
)
row format delimited
fields terminated by '|'
lines terminated by '\n'
stored as ORC
;
create table test4 (
id string
,name string
)
row format delimited
fields terminated by '|'
lines terminated by '\n'
stored as Parquet
;
create table test5 (
id string
,name string
)
row format delimited
fields terminated by '|'
lines terminated by '\n'
stored as RCFile
;
生成1000万条测试数据代码如下:
package com.tpiods.myself.work0510
import java.io.{
File, PrintWriter}
object GeneData {
def main(args: Array[String]): Unit = {
val output = "ods_etl/src/main/resources/work0515_hive/test.txt"
val sb

最低0.47元/天 解锁文章
1909

被折叠的 条评论
为什么被折叠?



