Window Function 窗口函数
- Perform calculations on an already generated result set ( a window).(在已生成的结果集上执行计算)
- Aggregate calculation(without having to group your data)(允许使用聚合函数时不用进行GROUP BY分组)
- Similar to subqueries in SELECT.
- Running totals, rankings, and moving averages, etc.(可计算累加值,排序,移动平均值等)
- Processed after every part of query except ORDER BY.(执行顺序在其他各部分之后,但在ORDER BY 之前)
- Uses information in result set rather than database.
- Available in PostgreSQL, Oracle, MySQL, SQLServer, and SQLite.
窗口函数是 SQL 中一类特别的函数。和聚合函数相似,窗口函数的输入也是多行记录。不同的是,聚合函数的作用于由 GROUP BY 子句聚合的组,而窗口函数则作用于一个窗口, 这里,窗口是由一个 OVER 子句 定义的多行记录。
聚合函数对其所作用的每一组记录输出一条结果,而窗口函数对其所作用的窗口中的每一行记录输出一条结果。
语法
FUNCTION(value) OVER ([PARTITION BY field] [ORDER BY field])注:[]中的内容可省略,根据实际情况选择使用。
PARTITION BY = range of calculation根据指定(1个或多个)字段进行分区,类似GROUP BY
ORDER BY = order of rows when running calculation 根据指定字段进行排序
常用函数
- 专用窗口函数
ROW_NUMBER() :从1开始,返回每组内部排序后的顺序编号(组内连续的唯一的)
RANK():计算排序,如果存在相同位次的记录,为相同的值分配相同的数字,但会跳过之后的位次。
DENSE_RANK():同样是计算排序,即使存在相同位次的记录,也不会跳过之后的位次。
如:
SELEECT goals,RANK() OVER(ORDER BY goals DESC) AS goals_rank,DENSE_RANK() OVER(ORDER BY goals DESC) AS goals_dense_rank,ROW_NUMBER() OVER(ORDER BY goals DESC) AS row_number
FROM grade
ORDER BY goals DESC;结果如下:goals goals_rank goals_dense_rank row_number
10 1 1 1
10 1 1 2
9 3 2 3
9 3 2 4
7 5 3 5
LAG(column, n):returns column's value at the row n rows before the current row. 返回当前行之前第n行的值(n省略时默认为1,表示返回当前行前1行的值)。
LEAD(column, n) : returns column's value at the row n row after the current row. 返回当前行之后第n行的值。
FIRST_VALUE(column):return the first value in the table or partition. 返回表中或分区中第一个值。
LAST_VALUE(column):return the last value in the table or partition. 返回表中或分区中最后一个值。
NTILE(n):splits data into n approximately equal pages. 将数据分为近乎相等的n等份。(暂时用的场景不多,以后再补充)
- 聚合函数: SUM, AVG, COUNT, MAX, MIN 也可以用于窗口函数。
分区示例


图1中 AVG(home_goal + away_goal) OVER() AS overall_avg,未使用PARTITION BY进行分区,所以计算的是总体的平均值。
图2中 AVG(homegoal + awaygoal) OVER(PARTITION BY season) AS season_avg,对season (表中的一个字段)进行分区,再计算分区内的平均值。

PARTITION BY 允许针对1列或多列进行分区,图3 中同时根据m.season和c.name 进行分组后在计算分组内的平均值。所以,第一行和第三行的 season_ctry_avg值相同。
Sliding Window 滑动窗口
In addition to calculating aggregate and rank information, window functions can also be used to calculate information that changes with each subsequent row in a data set. These types of window functions are called sliding windows.
除了计算汇总、聚合和排序等,窗口函数还可以用于计算随数据集中的每个后续行而变化的信息。这类窗口功能称为滑动窗口。
Sliding windows are functions that perform calculations relative to the current row of a data set. 滑动窗口是执行相对于数据集当前行的计算的功能。
You can use sliding windows to calculate a wide variety of information that aggregates one row at a time down your data set -- running totals, sums, counts, and averages in any order you need.
A sliding window calculation can also be partitioned by one or more column just like a non-sliding window.
滑动窗口 关键字(加在OVER从句中)
ROWS BETWEEN <start> AND <finish>
可用于start 和finish 的关键字有:
- PRECEDING : n PRECEDING means n rows before the current row 当前行的之前第n行
- FOLLOWING : n FOLLOWING means n row after the current row 当前行之后的第n行
- UNBOUNDED PRECEDING : every row since the beginning of the data set 数据集的开始
- UNBOUNDED FOLLOWING : every row to the end of the data set 数据集的末尾
- CURRENT ROW : tells SQL that you want to stop your calculation at the current row 当前行
示例


灵活运用窗口函数,可以对原始数据进行更为复杂的运算和分组,可以从不同角度看待数据,并从中发现更深层次的规律和结论。
你的点赞是我持续更新的动力~ 谢谢 Thanks♪(・ω・)ノ
其他SQL学习笔记 友情链接:
JessieY:SQL学习笔记 - CTE通用表表达式和WITH用法zhuanlan.zhihu.com