Hive-0.8.1 索引解析(BitMapIndex)

本文介绍了Hive 0.8.1版本中Bitmap索引的工作原理和实验过程,包括索引表结构、查询优化以及重建索引的脚本。通过示例展示了如何使用Bitmap索引进行数据过滤,并讨论了其在不同查询条件下的表现。实验结果显示,Bitmap索引在某些特定查询中能够提高效率。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

上来先做实验,用0.8.1下面的table03,来做这个BitMapIndex的实验。

hive> dfs -ls /user/hive/warehouse/table03;
Found 6 items
-rw-r--r--   1 allen supergroup   67109134 2012-03-12 21:48 /user/hive/warehouse/table03/000000_0
-rw-r--r--   1 allen supergroup   67108860 2012-03-12 21:48 /user/hive/warehouse/table03/000001_0
-rw-r--r--   1 allen supergroup   67108860 2012-03-12 21:48 /user/hive/warehouse/table03/000002_0
-rw-r--r--   1 allen supergroup   67108860 2012-03-12 21:48 /user/hive/warehouse/table03/000003_0
-rw-r--r--   1 allen supergroup   67108860 2012-03-12 21:49 /user/hive/warehouse/table03/000004_0
-rw-r--r--   1 allen supergroup   21344316 2012-03-12 21:49 /user/hive/warehouse/table03/000005_0
hive> create index bitmap_index on table table03(id)                
    > as 'org.apache.hadoop.hive.ql.index.bitmap.BitmapIndexHandler'
    > with deferred rebuild;                                        
OK
Time taken: 0.715 seconds
hive> alter index bitmap_index on table03 rebuild;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Starting Job = job_201203141051_0004, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201203141051_0004
Kill Command = /home/allen/Hadoop/hadoop-0.20.2/bin/../bin/hadoop job  -Dmapred.job.tracker=localhost:9001 -kill job_201203141051_0004
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 1
.
.
.
2012-03-14 13:49:33,749 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201203141051_0004
Loading data to table default.default__table03_bitmap_index__
Deleted hdfs://localhost:9000/user/hive/warehouse/default__table03_bitmap_index__
Table default.default__table03_bitmap_index__ stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 95701985, raw_data_size: 0]
MapReduce Jobs Launched: 
Job 0: Map: 2  Reduce: 1   HDFS Read: 356889161 HDFS Write: 95701985 SUCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 283.695 seconds
hive> 
下面看一下HDFS上都有哪些变化:

hive> dfs -ls /user/hive/warehouse/;              
Found 5 items
drwxr-xr-x   - allen supergroup          0 2012-03-12 17:26 /user/hive/warehouse/default__table02_compact_index__
drwxr-xr-x   - allen supergroup          0 2012-03-14 13:49 /user/hive/warehouse/default__table03_bitmap_index__
drwxr-xr-x   - allen supergroup          0 2012-03-04 22:22 /user/hive/warehouse/table01
drwxr-xr-x   - allen supergroup          0 2012-03-04 22:33 /user/hive/warehouse/table02
drwxr-xr-x   - allen supergroup          0 2012-03-12 21:49 /user/hive/warehouse/table03
hive> dfs -du /user/hive/warehouse/;
Found 5 items
74701985    hdfs://localhost:9000/user/hive/warehouse/default__table02_compact_index__
95701985    hdfs://localhost:9000/user/hive/warehouse/default__table03_bitmap_index__
356888890   hdfs://localhost:9000/user/hive/warehouse/table01
356888890   hdfs://localhost:9000/user/hive/warehouse/table02
356888890   hdfs://localhost:9000/user/hive/warehouse/table03
hive> dfs -ls /user/hive/warehouse/default__table03_bitmap_index__
    > ;
Found 1 items
-rw-r--r--   1 allen supergroup   95701985 2012-03-14 13:47 /user/hive/warehouse/default__table03_bitmap_index__/000000_0
hive> exit;                                                        
allen@allen-laptop:~/Desktop/hive-0.8.1$ hadoop fs -cat /user/hive/warehouse/default__table03_bitmap_index__/000000_0|head
12/03/14 14:22:45 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively
0hdfs://localhost:9000/user/hive/warehouse/table03/000000_00124858993459210
1hdfs://localhost:9000/user/hive/warehouse/table03/000000_0352124858993459210
2hdfs://localhost:9000/user/hive/warehouse/table03/000000_0704124858993459210
3hdfs://localhost:9000/user/hive/warehouse/table03/000000_01056124858993459210
4hdfs://localhost:9000/user/hive/warehouse/table03/000000_01408124858993459210
5hdfs://localhost:9000/user/hive/warehouse/table03/000000_01760124858993459210
6hdfs://localhost:9000/user/hive/warehouse/table03/000000_02112124858993459210
7hdfs://localhost:9000/user/hive/warehouse/table03/000000_02464124858993459210
8hdfs://localhost:9000/user/hive/warehouse/table03/000000_02816124858993459210
9hdfs://localhost:9000/user/hive/warehouse/table03/000000_03168124858993459210
cat: Unable to write to output stream.
allen@allen-laptop:~/Desktop/hive-0.8.1$ hadoop fs -text /user/hive/warehouse/default__table03_bitmap_index__/000000_0|head
12/03/14 14:23:10 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively
0hdfs://localhost:9000/user/hive/warehouse/table03/000000_00124858993459210
1hdfs://localhost:9000/user/hive/warehouse/table03/000000_0352124858993459210
2hdfs://localhost:9000/user/hive/warehouse/table03/000000_0704124858993459210
3hdfs://localhost:9000/user/hive/warehouse/table03/000000_01056124858993459210
4hdfs://localhost:9000/user/hive/warehouse/table03/000000_01408124858993459210
5hdfs://localhost:9000/user/hive/warehouse/table03/000000_01760124858993459210
6hdfs://localhost:9000/user/hive/warehouse/table03/000000_02112124858993459210
7hdfs://localhost:9000/user/hive/warehouse/table03/000000_02464124858993459210
8hdfs://localhost:9000/user/hive/warehouse/table03/000000_02816124858993459210
9hdfs://localhost:9000/user/hive/warehouse/table03/000000_03168124858993459210
text: Unable to write to output stream.
对比下compact的内容:
allen@allen-laptop:~/Desktop/hive-0.8.1$ hadoop fs -text /user/hive/warehouse/default__table02_compact_index__/000000_0|head
12/03/14 14:23:41 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively
0hdfs://localhost:9000/user/hive/warehouse/table02/000000_00
1hdfs://
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值