[Hive - LanguageManual ] Windowing and Analytics Functions (待)

本文详细介绍了HiveQL中窗口函数与分析函数的增强特性,包括如何使用PARTITION BY、ORDER BY、窗口定义等进行数据分组、排序与分析,以及LEAD、LAG、FIRST_VALUE、LAST_VALUE、RANK、ROW_NUMBER等功能的具体应用案例。
 
Skip to end of metadata
 
Go to start of metadata
 

Windowing and Analytics Functions

 

 

Enhancements to Hive QL

Version

Icon

Introduced in Hive version 0.11.

This section introduces the Hive QL enhancements for windowing and analytics functions. See "Windowing Specifications in HQL" (attached to HIVE-4197) for details. HIVE-896 has more information, including links to earlier documentation in the initial comments.

All of the windowing and analytics functions operate as per the SQL standard.

The current release supports the following functions for windowing and analytics:

  1. Windowing functions
    • LEAD
      • The number of rows to lead can optionally be specified. If the number of rows to lead is not specified, the lead is one row.
      • Returns null when the lead for the current row extends beyond the end of the window.
    • LAG
      • The number of rows to lag can optionally be specified. If the number of rows to lag is not specified, the lag is one row.
      • Returns null when the lag for the current row extends before the beginning of the window.
    • FIRST_VALUE
    • LAST_VALUE
  2. The OVER clause
    • OVER with standard aggregates:
      • COUNT
      • SUM
      • MIN
      • MAX
      • AVG
    • OVER with a PARTITION BY statement with one or more partitioning columns of any primitive datatype.
    • OVER with PARTITION BY and ORDER BY with one or more partitioning and/or ordering columns of any datatype.
      • OVER with a window specification. Windows can be defined separately in a WINDOW clause. Window specifications support these standard options:

        ROWS ((CURRENT ROW) | (UNBOUNDED | [num]) PRECEDING) AND (UNBOUNDED | [num]) FOLLOWING
        
        Icon

        The OVER clause supports the following functions, but it does not support a window with them (see HIVE-4797):

        Ranking functions: Rank, NTile, DenseRank, CumeDist, PercentRank.

        Lead and Lag functions.

  3. Analytics functions
    • RANK
    • ROW_NUMBER
    • DENSE_RANK
    • CUME_DIST
    • PERCENT_RANK
    • NTILE

Examples

This section provides examples of how to use the Hive QL windowing and analytics functions in SELECT statements. See HIVE-896 for additional examples.

PARTITION BY with one partitioning column, no ORDER BY or window specification
SELECT  a,  COUNT (b) OVER (PARTITION  BY  c)
FROM  T;
PARTITION BY with two partitioning columns, no ORDER BY or window specification
SELECT  a,  COUNT (b) OVER (PARTITION  BY  c, d)
FROM  T;
PARTITION BY with one partitioning column, one ORDER BY column, and no window specification
SELECT  a,  SUM (b) OVER (PARTITION  BY  ORDER  BY  d)
FROM  T;
PARTITION BY with two partitioning columns, two ORDER BY columns, and no window specification
SELECT  a,  SUM (b) OVER (PARTITION  BY  c, d  ORDER  BY  e, f)
FROM  T;
PARTITION BY with partitioning, ORDER BY, and window specification
SELECT  a,  SUM (b) OVER (PARTITION  BY  ORDER  BY  ROWS  BETWEEN  UNBOUNDED PRECEDING  AND  CURRENT  ROW)
FROM  T;
SELECT  a,  AVG (b) OVER (PARTITION  BY  ORDER  BY  ROWS  BETWEEN  3 PRECEDING  AND  CURRENT  ROW)
FROM  T;
SELECT  a,  AVG (b) OVER (PARTITION  BY  ORDER  BY  ROWS  BETWEEN  3 PRECEDING  AND  3 FOLLOWING)
FROM  T;
SELECT  a,  AVG (b) OVER (PARTITION  BY  ORDER  BY  ROWS  BETWEEN  CURRENT  ROW  AND  UNBOUNDED FOLLOWING)
FROM  T;

 

There can be multiple OVER clauses in a single query. A single OVER clause only applies to the immediately preceding function call. In this example, the first OVER clause applies to COUNT(b) and the second OVER clause applies to SUM(b):

SELECT  
  a,
  COUNT (b) OVER (PARTITION  BY  c),
  SUM (b) OVER (PARTITION  BY  c)
FROM  T;

Aliases can be used as well, with or without the keyword AS:

SELECT  
  a,
  COUNT (b) OVER (PARTITION  BY  c)  AS  b_count,
  SUM (b) OVER (PARTITION  BY  c) b_sum
FROM  T;
WINDOW clause
SELECT  a,  SUM (b) OVER w
FROM  T;
WINDOW w  AS  (PARTITION  BY  ORDER  BY  ROWS  UNBOUNDED PRECEDING)
LEAD using default 1 row lead and not specifying default value
SELECT  a, LEAD(a) OVER (PARTITION  BY  ORDER  BY  ROWS  BETWEEN  CURRENT  ROW  AND  1 FOLLOWING)
FROM  T;
LAG specifying a lag of 3 rows and default value of 0
SELECT  a, LAG(a, 3, 0) OVER (PARTITION  BY  ORDER  BY  ROWS  3 PRECEDING)
FROM  T;
 

转载于:https://www.cnblogs.com/tmeily/p/4249962.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值