1.hive的读取方式
2.各个方式的例子的对比
读取方式:
1. stored as textfile
1.1 直接查看hdfs
1.2 hadoop fs -text
2.stored as sequencefile
2.1 hadoop fs -text
3.stored as rcfile
3.1 hive -service rcfilecat path
4.stored as inpuformat 'class'
outformat 'class'
实际例子“:
1.hive> create table store1(username string,age int) stored as textfile;
查看建表详细信息(看黑体部分):
hive> desc formatted store1;
OK
# col_name data_type comment
username string
age int
# Detailed Table Information
Database: default
Owner: root
CreateTime: Fri Nov 20 06:37:32 PST 2015
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://cluster/user/hive/warehouse/store1
Table Type: MANAGED_TABLE
Table Parameters:
transient_lastDdlTime 1448030252
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
serialization.format 1
Time taken: 0.99 seconds, Fetched: 27 row(s)
2.hive> create table store2(username string,age int) stored as sequencefile;
查看建表详细信息(看黑体部分):
hive> desc formatted store2;
OK
# col_name data_type comment
username string
age int
# Detailed Table Information
Database: default
Owner: root
CreateTime: Fri Nov 20 06:41:24 PST 2015
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://cluster/user/hive/warehouse/store2
Table Type: MANAGED_TABLE
Table Parameters:
transient_lastDdlTime 1448030484
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.SequenceFileInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
serialization.format 1
Time taken: 0.196 seconds, Fetched: 27 row(s)
3.hive> create table store3(username string,age int) stored as rcfile;
查看建表详细信息(看黑体部分):
hive> desc formatted store3;
OK# col_name data_type comment
username string
age int
# Detailed Table Information
Database: default
Owner: root
CreateTime: Fri Nov 20 06:44:24 PST 2015
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://cluster/user/hive/warehouse/store3
Table Type: MANAGED_TABLE
Table Parameters:
transient_lastDdlTime 1448030664
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe
InputFormat: org.apache.hadoop.hive.ql.io.RCFileInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.RCFileOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
serialization.format 1
Time taken: 0.194 seconds, Fetched: 27 row(s)
4.创建输入输出类(里面没有实现什么只是与上面做个比较)
HInputFormat:
public class HInputFormat<K, V> extends SequenceFileInputFormat<K, V> {
@Override
public RecordReader<K, V> getRecordReader(InputSplit split, JobConf job,
Reporter reporter) throws IOException {
reporter.setStatus(split.toString());
return new HRecordReader<K,V>(job,(FileSplit)split);
}
@Override
protected FileStatus[] listStatus(JobConf job) throws IOException {
return super.listStatus(job);
}
}
HRecordReader:
public class HRecordReader<K, V> extends SequenceFileRecordReader<K, V> {
public HRecordReader(Configuration conf, FileSplit split)
throws IOException {
super(conf, split);
}
}
加入格式:
drop table if exists store4;
create table if not exists store4(
username string ,
age int )
row format delimited fields terminated by '\t' stored as inputformat 'com.test.out.HInputFormat'
outpurformat 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat';
把写的程序打成jar放到,Linux环境下
hive> add jar /root/Downloads/HiveOUTInput.jar;
hive> create table if not exists store4(
> username string ,
> age int )
> row format delimited fields terminated by '\t' stored as inputformat 'com.test.out.HInputFormat'
> outputformat 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat';
查看详细信息(注意黑体字):
hive> desc formatted store4;
OK
# col_name data_type comment
username string
age int
# Detailed Table Information
Database: default
Owner: root
CreateTime: Fri Nov 20 07:29:11 PST 2015
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://cluster/user/hive/warehouse/store4
Table Type: MANAGED_TABLE
Table Parameters:
transient_lastDdlTime 1448033351
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: com.test.out.HInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
field.delim \t
serialization.format \t
总结他们之间的不同就是输入输出流的不同