Hive0.11.0的新特性

最新推荐文章于 2023-08-31 23:43:42 发布

forever_ai

最新推荐文章于 2023-08-31 23:43:42 发布

阅读量863

点赞数

分类专栏： hive

hive 专栏收录该内容

27 篇文章

订阅专栏

本文介绍了Hive 0.11版本中的多项新特性，包括新增的Explaindependency语法、对简单查询的优化、支持DML操作、引入多种新函数如LEAD/LAG/NVL等、改进的文件格式ORCFile以及增强的group by语法等功能。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

转载：http://blog.youkuaiyun.com/wypblog/article/details/14167035

谢谢分享！

1、查看函数具体用法。

desc function extended xpath;

1、新增”Explain dependency”语法，以json格式输出执行语句会读取的input table和input partition信息，这样debug语句会读取哪些表就很方便了

 
   
        hive> explain dependency select count( 
        1 
        ) from p; 
       
 
        OK 
       
 
        { 
        "input_partitions" 
        : 
       
 
        [{ 
        "partitionName" 
        : 
        "default@p@stat_date=20110728/province=bj" 
        }, 
       
 
        { 
        "partitionName" 
        : 
        "default@p@stat_date=20110728/province=jx" 
        }, 
       
 
        { 
        "partitionName" 
        : 
        "default@p@stat_date=20110728/province=jx123" 
        }, 
       
 
        { 
        "partitionName" 
        : 
        "default@p@stat_date=20110728/province=zhejiang" 
        }], 
       
 
        "input_tables" 
        :[{ 
        "tablename" 
        : 
        "default@p" 
        , 
        "tabletype" 
        : 
        "MANAGED_TABLE" 
        }]} 
       
 
        Time taken:  
        1.158 
        seconds, Fetched:  
        1 
        row(s) 
       
 
 

　　2、对于简单的不需要聚合的类似SELECT col from table LIMIT 20语句，不需要起MapReduce job，直接通过Fetch task获取数据
　　3、Union优化，如果Union语句的parent是mapreduce job，那么它会先将结果写入临时文件中，Union再读取这些临时文件写入最终目录，上层语句再读取最终目录，这样导致结果文件读了两遍。优化策略就是结果数据直接往最终目录上写
　　4、实现了TRUNCATE，可以删除HDFS上面相关表格存储的数据，但是会保持表和metadata的完整性。

 
        hive> TRUNCATE TABLE p;

这样将会删掉表格p关联的所以数据
　　5、建立了大量的关键字
在Hive 0.11下面的语句是合法的：

 
        hive> create table table(id  
        int 
        );

而在Hive 0.11之前版本是不行的
　　6、Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive

 
        hive> select id, rat, lag(id, 
        2 
        , 
        100000 
        ) from m limit  
        10 
        ; 
       
        OK 
       
        12  
        3   
         100000 
       
        13  
        2   
         100000 
       
        276 
        1   
         12 
       
        716 
        5   
         13 
       
        880 
        3   
         276 
       
        378 
        3   
         716 
       
        913 
        2   
         880 
       
        721 
        3   
         378 
       
        676 
        4   
         913 
       
        806 
        4   
         721 
       
        hive> select id, rat, lag(id, 
        1 
        ) from m limit  
        10 
        ; 
       
        和 
       
        hive> select id, rat, lag(id) from m limit  
        10 
        ; 
       
        都输出： 
       
        OK 
       
        12  
        3   
         NULL 
       
        13  
        2   
         12 
       
        276 
        1   
         13 
       
        716 
        5   
         276 
       
        880 
        3   
         716 
       
        378 
        3   
         880 
       
        913 
        2   
         378 
       
        721 
        3   
         913 
       
        676 
        4   
         721 
       
        806 
        4   
         676 
       
        hive> select id, rat, lead(id, 
        1 
        ) from m limit  
        10 
        ; 
       
        和 
       
        hive> select id, rat, lead(id) from m limit  
        10 
        ; 
       
        输出结果一样 
       
        OK 
       
        12  
        3   
         13 
       
        13  
        2   
         276 
       
        276 
        1   
         716 
       
        716 
        5   
         880 
       
        880 
        3   
         378 
       
        378 
        3   
         913 
       
        913 
        2   
         721 
       
        721 
        3   
         676 
       
        676 
        4   
         806 
       
        806 
        4   
         495

　　7、增加了NVL函数

 
        hive> select NVL(name ,’no name’) from m limit  
        10 
        ;

　　8、通过配置可以指定Hive中log4j日志配置文件的存放路径。

 
        hive \ 
       
        -hiveconf hive.log4j.file=/home/carl/hive-log4j.properties \ 
       
        -hiveconf hive.log4j.exec.file=/home/carl/hive-exec-log4j.properties

　　9、新增了DECIMAL 新类型，并且可以在Regex Serde中使用
　　10、新增Hive Server 2，解决之前存在的security和concurrency问题。同时新增加了Beeline CLI（基于SQLLine），可以在command-line中以交互式的访问Hive Server 2
　　11、支持DML
　　12、删除分区时，支持使用IGNORE PROTECTION谓词
　　13、当用户将Hive查询结果输出到文件，用户可以指定列的分割符，默认的是^A

 
        hive> insert overwrite local directory  
        '/home/wyp/Documents/result' 
       
        hive> select * from test;

上面是用’^A’来分割得出的列，如果我们需要指定列之间的分隔符可以用下面的命令实现：

 
        hive> insert overwrite local directory  
        '/home/wyp/Documents/result' 
       
        hive> row format delimited 
       
        hive> fields terminated by  
        '\t' 
       
        hive> select * from test;

增加了两行，从而使得列与列之间以我们指定的分隔符来指定。
定义map的分割可以用下面的命令实现：

 
        insert overwrite local directory  
        './test-04' 
       
        row format delimited  
       
        FIELDS TERMINATED BY  
        '\t' 
       
        COLLECTION ITEMS TERMINATED BY  
        ',' 
       
        MAP KEYS TERMINATED BY  
        ':' 
       
        select * from src;

　　14、增加操作员级别的Hooks
　　15、支持ALTER VIEW AS SELECT
　　16、可以得到RCFile中每一列压缩和未压缩的大小
　　17、可以通过CLI修改表格的bucketing/sorting元数据
　　18、增加了Hive Profiler工具，可以用它追踪用户的wall times和call counts
　　19、支持创建和删除临时分区
　　20、ORC支持内存管理
　　21、把Hcatalog整合到hive里面了，而不是独立的项目
　　22、支持ORCFile文件格式(Optimized Row Columnar)，基于列存储，文件内置有inline index，可以基于文件做predicate pushdown，根据stripe的元数据来选择是否跳过stripe，大大降低input size
　　23、group by 语法增强，group by除了可以跟column alias，也可以跟column position

1

2

3

4

5

 
      比如： 
     
 
      select f1(col1), f2(col2), f3(col3), count( 
      1 
      ) \ 
     
 
      group by f1(col1), f2(col2), f3(col3); 
     
 
      可以写成 
     
 
      select f1(col1), f2(col2), f3(col3), count( 
      1 
      ) group by  
      1 
      , 
      2 
      , 
      3 
      ;