上来先做实验
hive> create table pre_data_triple (id1 int ,id2 int,name string)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t'
> ;
OK
Time taken: 0.152 seconds
hive> drop table triple;
OK
Time taken: 0.476 seconds
hive> load data local inpath '/home/allen/Desktop/triple.txt' overwrite into table pre_data_triple;
Copying data from file:/home/allen/Desktop/triple.txt
Copying file: file:/home/allen/Desktop/triple.txt
Loading data to table default.pre_data_triple
Deleted hdfs://localhost:9000/user/hive/warehouse/pre_data_triple
OK
Time taken: 57.503 seconds
hive> create table triple as
> select * from pre_data_triple;
Total MapReduce jobs = 2
Launching Job 1 out of 2
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201203232214_0011, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201203232214_0011
Kill Command = /home/allen/Hadoop/hadoop-0.20.2/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:9001 -kill job_201203232214_0011
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 0
2012-03-25 23:22:53,098 Stage-1 map = 0%, reduce = 0%
2012-03-25 23:23:38,578 Stage-1 map = 48%, reduce = 0%
2012-03-25 23:23:51,085 Stage-1 map = 63%, reduce = 0%
2012-03-25 23:24:03,213 Stage-1 map = 75%, reduce = 0%
2012-03-25 23:24:55,141 Stage-1 map = 88%, reduce = 0%
2012-03-25 23:25:49,772 Stage-1 map = 100%, reduce = 0%
Ended Job = job_201203232214_0011
Ended Job = 266398561, job is filtered out (removed at runtime).
Moving data to: hdfs://localhost:9000/tmp/hive-allen/hive_2012-03-25_23-22-43_338_7650460868136597059/-ext-10001
Moving data to: hdfs://localhost:9000/user/hive/warehouse/triple
Table default.triple stats: [num_partitions: 0, num_files: 2, num_rows: 0, total_size: 363777426, raw_data_size: 0]
999999 Rows loaded to hdfs://localhost:9000/tmp/hive-allen/hive_2012-03-25_23-22-43_338_7650460868136597059/-ext-10000
MapReduce Jobs Launched:
Job 0: Map: 2 HDFS Read: 363797911 HDFS Write: 363777426 SUCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 203.354 seconds
hive> create index dual_index on table triple(id1,id2)
> as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
> with deferred rebuild;
OK
Time taken: 0.516 seconds
hive> show tables;
OK
default__table02_compact_index__
default__table03_bitmap_index__
default__test_bitmap_index__
default__triple_dual_index__
pre_data_triple
table02
table03
test
triple
Time taken: 0.352 seconds
hive> drop table pre_data_triple;
OK
Time taken: 0.41 seconds
hive> alter index dual_index on table triple(id1,id2);
FAILED: Parse Error: line 1:26 mismatched input 'table' expecting Identifier near 'on' in alter index statement
hive> alter index dual_index on table triple rebuild;
FAILED: Parse Error: line 1:26 mismatched input 'table' expecting Identifier near 'on' in alter index statement
hive> alter index dual_index on table triple rebuild;
FAILED: Parse Error: line 1:26 mismatched input 'table' expecting Identifier near 'on' in alter index statement
hive> alter index dual_index on table triple rebuild;
FAILED: Parse Error: line 1:26 mismatched input 'table' expecting Identifier near 'on' in alter index statement
hive> alter index dual_index on triple rebuild;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Starting Job = job_201203232214_0012, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201203232214_0012
Kill Command = /home/allen/Hadoop/hadoop-0.20.2/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:9001 -kill job_201203232214_0012
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 1
2012-03-25 23:35:42,999 Stage-1 map = 0%, reduce = 0%
2012-03-25 23:36:31,371 Stage-1 map = 13%, reduce = 0%
2012-03-25 23:36:34,436 Stage-1 map = 48%, reduce = 0%
2012-03-25 23:36:52,912 Stage-1 map = 60%, reduce = 0%
2012-03-25 23:36:56,008 Stage-1 map = 75%, reduce = 0%
2012-03-25 23:37:05,457 Stage-1 map = 88%, reduce = 0%
2012-03-25 23:37:45,085 Stage-1 map = 88%, reduce = 17%
2012-03-25 23:38:10,588 Stage-1 map = 100%, reduce = 17%
2012-03-25 23:38:40,381 Stage-1 map = 100%, reduce = 33%
2012-03-25 23:38:43,703 Stage-1 map = 100%, reduce = 67%
2012-03-25 23:38:53,391 Stage-1 map = 100%, reduce = 68%
2012-03-25 23:38:58,664 Stage-1 map = 100%, reduce = 69%
2012-03-25 23:39:04,970 Stage-1 map = 100%, reduce = 70%
2012-03-25 23:39:08,096 Stage-1 map = 100%, reduce = 71%
2012-03-25 23:39:14,467 Stage-1 map = 100%, reduce = 72%
2012-03-25 23:39:20,722 Stage-1 map = 100%, reduce = 73%
2012-03-25 23:39:23,885 Stage-1 map = 100%, reduce = 74%
2012-03-25 23:39:27,249 Stage-1 map = 100%, reduce = 75%
2012-03-25 23:39:32,433 Stage-1 map = 100%, reduce = 76%
2012-03-25 23:39:38,599 Stage-1 map = 100%, reduce = 78%
2012-03-25 23:39:41,653 Stage-1 map = 100%, reduce = 80%
2012-03-25 23:39:44,701 Stage-1 map = 100%, reduce = 83%
2012-03-25 23:39:47,738 Stage-1 map = 100%, reduce = 85%
2012-03-25 23:39:50,790 Stage-1 map = 100%, reduce = 88%
2012-03-25 23:39:53,826 Stage-1 map = 100%, reduce = 91%
2012-03-25 23:39:56,868 Stage-1 map = 100%, reduce = 93%
2012-03-25 23:39:59,917 Stage-1 map = 100%, reduce = 96%
2012-03-25 23:40:02,972 Stage-1 map = 100%, reduce = 97%
2012-03-25 23:40:06,040 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201203232214_0012
Loading data to table default.default__triple_dual_index__
Deleted hdfs://localhost:9000/user/hive/warehouse/default__triple_dual_index__
Table default.default__triple_dual_index__ stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 81179259, raw_data_size: 0]
MapReduce Jobs Launched:
Job 0: Map: 2 Reduce: 1 HDFS Read: 363794078 HDFS Write: 81179259 SUCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 277.168 seconds
hive>
hive> desc default__triple_dual_index__;
OK
id1 int
id2 int
_bucketname string
_offsets array<bigint>
Time taken: 0.327 seconds
hive> select * from default__triple_dual_index__ limit 10;
OK
1 999999 hdfs://localhost:9000/user/hive/warehouse/triple/000000_0 [0]
2 999998 hdfs://localhost:9000/user/hive/warehouse/triple/000000_0 [359]
3 999997 hdfs://localhost:9000/user/hive/warehouse/triple/000000_0 [718]
4 999996 hdfs://localhost:9000/user/hive/warehouse/triple/000000_0 [1077]
5 999995 hdfs://localhost:9000/user/hive/warehouse/triple/000000_0 [1436]
6 999994 hdfs://localhost:9000/user/hive/warehouse/triple/000000_0 [1795]
7 999993 hdfs://localhost:9000/user/hive/warehouse/triple/000000_0 [2154]
8 999992 hdfs://localhost:9000/user/hive/warehouse/triple/000000_0 [2513]
9 999991 hdfs://localhost:9000/user/hive/warehouse/triple/000000_0 [2872]
10 999990 hdfs://localhost:9000/user/hive/warehouse/triple/000000_0 [3231]
Time taken: 0.394 seconds
hive>