hive索引_hive mapreduce=0-优快云博客

本文链接：https://blog.youkuaiyun.com/AnneQiQi/article/details/51425342

本文介绍了Hive的索引功能，旨在提高查询速度，但会增加存储成本。通过创建、显示和删除索引的示例展示了不同类型的索引操作。通过对比未创建索引和已创建索引的查询性能，揭示了索引在特定查询中的优势。同时，文章提到了Hive的Fetch Task转换，可以进一步提升简单查询的效率。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

索引是标准的数据库技术，hive 0.7版本之后支持索引。

Hive的索引目的是提高Hive表指定列的查询速度。

没有索引时，类似'WHERE tab1.col1 = 10' 的查询，Hive会加载整张表或分区，然后处理所有的rows，但是如果在字段col1上面存在索引时，那么只会加载和处理文件的一部分。

与其他传统数据库一样，增加索引在提升查询速度时，会消耗额外资源去创建索引和需要更多的磁盘空间存储索引。

Hive提供有限的索引功能，这不像传统的关系型数据库那样有“键(key)”的概念，用户可以在某些列上创建索引来加速某些操作，给一个表创建的索引数据被保存在另外的表中。Hive的索引功能现在还相对较晚，提供的选项还较少。但是，索引被设计为可使用内置的可插拔的java代码来定制，用户可以扩展这个功能来满足自己的需求。当然不是说有的查询都会受惠于Hive索引。用户可以使用EXPLAIN语法来分析HiveQL语句是否可以使用索引来提升用户查询的性能。像RDBMS中的索引一样，需要评估索引创建的是否合理，毕竟，索引需要更多的磁盘空间，并且创建维护索引也会有一定的代价。用户必须要权衡从索引得到的好处和代价。

索引的一般用法

下面介绍索引的常见用法：

A、 Create/build，show和drop index

create index table01_index on table table01(column2) as 'COMPACT' with deferred rebuild;

show index on table01;

drop index table01_index on table01;

B、 Create then build，show formatted和drop index

create index table02_index on table table02(column3) as 'compact' with deferred rebuild;

alter index table02_index on table02 rebuild;

show formatted index on table02;

drop index table02_index on table02;

C、创建bitmap索引，build,show 和drop

create index table03_index on table table03 (column4) as 'bitmap' with deferred rebuild;

alter index table03_index ontable03 rebuild;

show formatted index ontable03;

drop index table03_index on table03;

D、在一张新表上创建索引

create index table04_index on table table04 (column5) as 'compact' with deferred rebuild in tabletable04_index_table;

E、创建索引，存储格式为RCFile

create index table05_index on table table05 (column6) as 'compact' with deferred rebuild stored as rcfile;

F、创建索引，存储格式为TextFile

create index table06_index on table table06 (column7) as 'compact' with deferred rebuild row format delimited fields terminated by '\t' stored as textfile;

G、创建带有索引属性的索引

create index table07_index on table table07 (column8) as 'compact' with deferred rebuild idxproperties("prop1"="value1", "prop2"="value2");

H、创建带有表属性的索引

create index table08_index on table table08 (column9) as 'compact' with deferred rebuild tblproperties("prop3"="value3", "prop4"="value4");

I、如果索引存在，则删除

drop index if exists table09_index on table09;

J、在分区上重建索引

alter index table10_index on table10partition (columnx='valueq', columny='valuer') rebuild;

　　下面说说怎么创建索引：

　　1、先创建表：

 
      1
      hive> create table user( id int, name string) 
     
      2
          > ROW FORMAT DELIMITED 
     
      3
          > FIELDS TERMINATED BY'\t'
     
      4
          > STORED AS TEXTFILE;

　　2、导入数据：

 
      1
      hive> load data local inpath '/export1/tmp/wyp/row.txt'
     
      2
          > overwrite into table user;

　　3、创建索引之前测试

 
      01
      hive> select * from user where id =500000;
     
      02
      Total MapReduce jobs = 1
     
      03
      Launching Job 1out of1
     
      04
      Number of reduce tasks is set to 0since there's no reduce operator
     
      05
      Cannot run job locally: Input Size (=  356888890) is larger than 
     
      06
      hive.exec.mode.local.auto.inputbytes.max (=134217728)