Postgresql window function 第一篇
window function,简单来说就是窗口函数。postgresql 在8.4版本开始有了窗口函数的特性。
看多很多中文解释,找不到合适的解释什么是窗口函数,有句英文很好的诠释了什么是窗口函数。 “ The whole idea behind window functions is to allow you to process several values of the result set at a time: you see through the window some peer rows and are able to compute a single output value from them, much like when using an aggregate function. ”先来介绍一下postgresql里面关于窗口函数的语法吧。
function name over ( partition by column order by column [ RANGE | ROWS ] BETWEEN frame_start AND frame_end);
这是简化的版本,官网上对于语句写的更详细,有兴趣的可以自己去看看。 对于第一次看见的人来说,这么长的语法可能一下子无法接受,那我们就慢慢一步一步来的说。
首先创建下面这张表
点击(此处)折叠或打开
- create table empsalary
- (
- depname varchar(20),
- empno varchar(20),
- salary integer
- )
点击(此处)折叠或打开
- postgres=# select * from empsalary;
- depname | empno | salary
- ----------+-------+--------
- develop | 11 | 5200
- develop | 7 | 4200
- develop | 9 | 4500
- personel | 5 | 3500
- personel | 6 | 6500
- personel | 12 | 6500
- personel | 15 | 8900
- (7 行记录)
只使用 function name over(),这是窗口函数使用最简单的方式了吧。
点击(此处)折叠或打开
- postgres=# select depname, empno, salary, sum(salary) over() from empsalary;
- depname | empno | salary | sum
- ----------+-------+--------+-------
- develop | 11 | 5200 | 39300
- develop | 7 | 4200 | 39300
- develop | 9 | 4500 | 39300
- personel | 5 | 3500 | 39300
- personel | 6 | 6500 | 39300
- personel | 12 | 6500 | 39300
- personel | 15 | 8900 | 39300
- (7 行记录)
关于窗口函数就先写到这里吧。
Postgresql window function 第二篇
上一篇很好的解释了什么是窗口函数,并且举了一个最简单的例子,帮助理解什么是窗口函数。接下来我们来更深入的理解postgresql的窗口函数。还是借用上一篇新建的表来讲解。
点击(此处)折叠或打开
- postgres=# \\d empsalary
- 资料表 \"public.empsalary\"
- 栏位 | 型别 | 修饰词
- ---------+-----------------------+--------
- depname | character varying(20) |
- empno | character varying(20) |
- salary | integer |
function name over ( partition by column) 这里的partition by 子句用于对行进行分组的。
点击(此处)折叠或打开
- postgres=# select depname, empno, salary, sum(salary) over(partition by depname) from empsalary;
- depname | empno | salary | sum
- ----------+-------+--------+-------
- develop | 11 | 5200 | 13900
- develop | 7 | 4200 | 13900
- develop | 9 | 4500 | 13900
- personel | 5 | 3500 | 25400
- personel | 6 | 6500 | 25400
- personel | 12 | 6500 | 25400
- personel | 15 | 8900 | 25400
- (7 行记录)
EXAMPLE3:
function name over (order by column)
点击(此处)折叠或打开
- postgres=# select depname, empno, salary, sum(salary) over(order by salary) from
- empsalary;
- depname | empno | salary | sum
- ----------+-------+--------+-------
- personel | 5 | 3500 | 3500
- develop | 7 | 4200 | 7700
- develop | 9 | 4500 | 12200
- develop | 11 | 5200 | 17400
- personel | 6 | 6500 | 30400
- personel | 12 | 6500 | 30400
- personel | 15 | 8900 | 39300
- (7 行记录)
点击(此处)折叠或打开
- postgres=# select depname, empno, salary,array_agg(salary) over(order by salary)
- ,sum(salary) over(order by salary) from empsalary;
- depname | empno | salary | array_agg | sum
- ----------+-------+--------+--------------------------------------+-------
- personel | 5 | 3500 | {3500} | 3500
- develop | 7 | 4200 | {3500,4200} | 7700
- develop | 9 | 4500 | {3500,4200,4500} | 12200
- develop | 11 | 5200 | {3500,4200,4500,5200} | 17400
- personel | 6 | 6500 | {3500,4200,4500,5200,6500,6500} | 30400
- personel | 12 | 6500 | {3500,4200,4500,5200,6500,6500} | 30400
- personel | 15 | 8900 | {3500,4200,4500,5200,6500,6500,8900} | 39300
- (7 行记录)
EXAMPLE 3:
function name over (order by column [rows | range ] between framestart and frameend);
rows between 子句无法独自和over一起使用,这个子句的作用也是决定了那些行可以被每一行看到。
点击(此处)折叠或打开
- postgres=# select depname, empno, salary,sum(salary) over(order by salary rows between unbounded preceding and current row) from empsalary;
- depname | empno | salary | sum
- ----------+-------+--------+-------
- personel | 5 | 3500 | 3500
- develop | 7 | 4200 | 7700
- develop | 9 | 4500 | 12200
- develop | 11 | 5200 | 17400
- personel | 6 | 6500 | 23900
- personel | 12 | 6500 | 30400
- personel | 15 | 8900 | 39300
- (7 行记录)
至此所有的部分都已经讲完了,在复杂的用法就是把他们结合起来使用。相信大家能很好的使用窗口函数。
postgresql windows function 第三篇
前面两篇已经很好的介绍了窗口函数,这篇我们来关注一下function name这个部分。postgresql有一些内置的函数,专门用于窗口函数。如下
Function | Return Type | Description |
---|---|---|
row_number() | bigint | number of the current row within its partition, counting from 1 |
rank() | bigint | rank of the current row with gaps; same as row_number of its first peer |
dense_rank() | bigint | rank of the current row without gaps; this function counts peer groups |
percent_rank() | double precision | relative rank of the current row: (rank - 1) / (total rows - 1) |
cume_dist() | double precision | relative rank of the current row: (number of rows preceding or peer with current row) / (total rows) |
ntile(num_buckets integer) | integer | integer ranging from 1 to the argument value, dividing the partition as equally as possible |
lag(value any [, offsetinteger [, default any ]]) | same type as value | returns value evaluated at the row that is offset rows before the current row within the partition; if there is no such row, instead return default. Bothoffset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to null |
lead(value any [, offsetinteger [, default any ]]) | same type as value | returns value evaluated at the row that is offset rows after the current row within the partition; if there is no such row, instead return default. Bothoffset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to null |
first_value(value any) | same type as value | returns value evaluated at the row that is the first row of the window frame |
last_value(value any) | same type as value | returns value evaluated at the row that is the last row of the window frame |
nth_value(value any, nthinteger) | same type as value | returns value evaluated at the row that is the nth row of the window frame (counting from 1); null if no such row |
row_number()
点击(此处)折叠或打开
- postgres=# select depname, empno, salary, row_number() over() from empsalary;
- depname | empno | salary | row_number
- ----------+-------+--------+------------
- develop | 11 | 5200 | 1
- develop | 7 | 4200 | 2
- develop | 9 | 4500 | 3
- personel | 5 | 3500 | 4
- personel | 6 | 6500 | 5
- personel | 12 | 6500 | 6
- personel | 15 | 8900 | 7
- (7 行记录)
点击(此处)折叠或打开
- postgres=# select depname, empno, salary, row_number() over() from empsalary order by salary;
- depname | empno | salary | row_number
- ----------+-------+--------+------------
- personel | 5 | 3500 | 4
- develop | 7 | 4200 | 2
- develop | 9 | 4500 | 3
- develop | 11 | 5200 | 1
- personel | 6 | 6500 | 5
- personel | 12 | 6500 | 6
- personel | 15 | 8900 | 7
- (7 行记录)
正确的写法是
点击(此处)折叠或打开
- postgres=# select depname, empno, salary, row_number() over(order by salary) from empsalary ;
- depname | empno | salary | row_number
- ----------+-------+--------+------------
- personel | 5 | 3500 | 1
- develop | 7 | 4200 | 2
- develop | 9 | 4500 | 3
- develop | 11 | 5200 | 4
- personel | 6 | 6500 | 5
- personel | 12 | 6500 | 6
- personel | 15 | 8900 | 7
- (7 行记录)
rank()
点击(此处)折叠或打开
- postgres=# select depname, empno, salary, rank() over(order by salary) from empsalary ;
- depname | empno | salary | rank
- ----------+-------+--------+------
- personel | 5 | 3500 | 1
- develop | 7 | 4200 | 2
- develop | 9 | 4500 | 3
- develop | 11 | 5200 | 4
- personel | 6 | 6500 | 5
- personel | 12 | 6500 | 5
- personel | 15 | 8900 | 7
- (7 行记录)
dense_rank()
点击(此处)折叠或打开
- postgres=# select depname, empno, salary, dense_rank() over(order by salary) from empsalary ;
- depname | empno | salary | dense_rank
- ----------+-------+--------+------------
- personel | 5 | 3500 | 1
- develop | 7 | 4200 | 2
- develop | 9 | 4500 | 3
- develop | 11 | 5200 | 4
- personel | 6 | 6500 | 5
- personel | 12 | 6500 | 5
- personel | 15 | 8900 | 6
- (7 行记录)
percent_rank() 和 cume_dist() 这两个函数不知道有什么用,不过计算的公式就在上面,看看就知道怎么算了。
ntile()
点击(此处)折叠或打开
- postgres=# select depname, empno, salary, ntile(3) over(order by salary) from empsalary ;
- depname | empno | salary | ntile
- ----------+-------+--------+-------
- personel | 5 | 3500 | 1
- develop | 7 | 4200 | 1
- develop | 9 | 4500 | 1
- develop | 11 | 5200 | 2
- personel | 6 | 6500 | 2
- personel | 12 | 6500 | 3
- personel | 15 | 8900 | 3
- (7 行记录)
lag( value any [, offsetinteger [, default any ]])
点击(此处)折叠或打开
- postgres=# select depname, empno, salary, lag(salary,1,0) over(order by salary)
- from empsalary ;
- depname | empno | salary | lag
- ----------+-------+--------+------
- personel | 5 | 3500 | 0
- develop | 7 | 4200 | 3500
- develop | 9 | 4500 | 4200
- develop | 11 | 5200 | 4500
- personel | 6 | 6500 | 5200
- personel | 12 | 6500 | 6500
- personel | 15 | 8900 | 6500
- (7 行记录)
first_value()
点击(此处)折叠或打开
- postgres=# select depname, empno, salary, first_value(salary) over () from empsa
- lary;
- depname | empno | salary | first_value
- ----------+-------+--------+-------------
- develop | 11 | 5200 | 5200
- develop | 7 | 4200 | 5200
- develop | 9 | 4500 | 5200
- personel | 5 | 3500 | 5200
- personel | 6 | 6500 | 5200
- personel | 12 | 6500 | 5200
- personel | 15 | 8900 | 5200
- (7 行记录)
last_value() 刚好和first_value()相反,取最后一列。
点击(此处)折叠或打开
- postgres=# select depname, empno, salary, last_value(salary) over () from empsal
- ary;
- depname | empno | salary | last_value
- ----------+-------+--------+------------
- develop | 11 | 5200 | 8900
- develop | 7 | 4200 | 8900
- develop | 9 | 4500 | 8900
- personel | 5 | 3500 | 8900
- personel | 6 | 6500 | 8900
- personel | 12 | 6500 | 8900
- personel | 15 | 8900 | 8900
- (7 行记录)
点击(此处)折叠或打开
- postgres=# select depname, empno, salary, last_value(salary) over (order by sala
- ry) from empsalary;
- depname | empno | salary | last_value
- ----------+-------+--------+------------
- personel | 5 | 3500 | 3500
- develop | 7 | 4200 | 4200
- develop | 9 | 4500 | 4500
- develop | 11 | 5200 | 5200
- personel | 6 | 6500 | 6500
- personel | 12 | 6500 | 6500
- personel | 15 | 8900 | 8900
- (7 行记录)
而nth_value()我想你自己从字面上就能理解了吧。这里就不讲解了。