窗口函数介绍
窗口函数的引入是为了解决想要既显示聚集前的数据,又要显示聚集后的数据;窗口数对一组值进行操作,不需要使用GROUP BY子句对数据进行分组,能够在同一行中同时返回基础行的列和聚合列。
强调:使用MySQL 8.0版本方可实现
基本语法
函数名(列) over(选项) 选项为partition by 列 order by 列
解释:
- over(partition by xxx) 按xxx分组的所有行进行分组
- over(partition by xxx order by aaa) 按列xxx分组,按列aaa排序
- over(order by aaa) 按aaa列排序
- over括号中的partition by和order by的使用根据具体情况选择
-- 需求:计算每个学生的及格科目数
-- 使用聚合函数,类似数据透视表,原有表结构已发生变化
SELECT student_id,count( sid ) FROM score WHERE num >= 60 GROUP BY student_id;-- 使用窗口函数,不会更改原表结构
SELECT student_id,count( sid ) over ( PARTITION BY student_id ORDER BY student_id ) AS 及格数
FROM score WHERE num >= 60;
聚合窗口函数
语法:聚合函数(列) over(partition by 列 order by 列)
常见的聚合函数:sum() count() avg() max() min()
排序窗口函数
- row_number():仅仅根据行号进行排序,相同结果则排序按照顺序依次排
- rank():排名,与row_number函数不同的是,rank函数考虑到了over子句中排序字段值相同的情况,over子句中排序字段值相同的排序结果是一样的,后面字段值不相同的序号将跳过相同的排名号排下一个。如:11335
- dense_rank():密集排序,用法跟rank类似,唯一不同是当排序结果相同时,排序不跳跃,而是紧跟排下一个。如:11223
- ntile():桶排序,首先,ntile会先根据你的分组依据,然后把每个组的总记录数进行按照你给的ntile(n)里的数字n进行均分,这个数字就是桶数,例如一个组内总共12条记录,若n=6,则等划分成6桶,然后按照num的排序等级划分,12/6=2则每个桶两条记录,也就是112233445566的排序结果,常用于提取前百分之多少的应用场景。
都是排名函数,不同之处在对于名次相同的数据处理方式
-- 对每门课程进行排序
SELECTs.sid,s1.sname,s1.gender,c.cname,s.num,row_number() over ( PARTITION BY c.cname ORDER BY num DESC ) AS row_number排名,rank() over ( PARTITION BY c.cname ORDER BY num DESC ) AS rank排名,dense_rank() over ( PARTITION BY c.cname ORDER BY num DESC ) AS dense_rank排名,ntile( 6 ) over ( PARTITION BY c.cname ORDER BY num DESC ) AS ntile排名
FROMscore AS sJOIN student AS s1 ON s.student_id = s1.sidLEFT JOIN course AS c ON s.course_id = c.cid-- 计算每门课程前三,考虑排名相同的情况
SELECT * FROM (SELECT s.sid,s1.sname,s1.gender,c.cname,s.num,dense_rank() over ( PARTITION BY c.cname ORDER BY num DESC ) AS dense_rank排名 FROMscore AS sJOIN student AS s1 ON s.student_id = s1.sidLEFT JOIN course AS c ON s.course_id = c.cid ) AS a
WHEREdense_rank排名 <=3
位置移动窗口函数
- lag(col,n):col列名,n行数,用于统计窗口内往上第n行值
- lead(col,n):col列名,n行数,用于统计窗口内往下第n行值
这两个函数可以用于同列中相邻行的数据计算
应用场景:
- 计算作弊次数
-- 需求:对于下面的数据,对于同一用户(uid)如果在2分钟之内重新登陆,则判断为作弊,统计哪些用户有作弊行为,并计算作弊次数
-- 数据代码
CREATE TABLE lead_table (id INT PRIMARY KEY,uid INT NOT NULL,login_time datetime NOT NULL );
INSERT INTO lead_table
VALUES( 1, 1, "2020-8-26 12:59:00" ),( 2, 1, "2020-8-26 13:02:23" ),( 3, 1, "2020-8-26 13:03:34" ),( 4, 1, "2020-8-26 13:09:00" ),( 5, 2, "2020-8-26 13:57:00" ),( 6, 2, "2020-8-26 13:59:00" ),( 7, 2, "2020-8-26 13:59:45" );
思路:根据题目要求,如果能把相邻两列的下面一列与上面那一列变成同一行,不久能实现相减了么,因此我们可以多生成一列,例如:把uid都为1的第二行记录生成到第一行,以此类推,这就可以用到lead往下移动的操作了
-- 第一步
SELECT id,uid,login_time,
LEAD( login_time, 1 ) OVER ( PARTITION BY uid ORDER BY login_time ) lead_time
FROM lead_table;--第二步
SELECT id,uid,login_time,
LEAD( login_time, 1 ) OVER ( PARTITION BY uid ORDER BY login_time ) lead_time,
TIMESTAMPDIFF(SECOND,login_time,LEAD( login_time, 1 ) OVER ( PARTITION BY uid ORDER BY login_time )) AS 相差秒数
FROM lead_table;-- 最终代码
SELECT uid,COUNT( 1 ) AS 作弊次数
FROM(SELECT id,uid,login_time,LEAD( login_time, 1 ) OVER ( PARTITION BY uid ORDER BY login_time ) lead_time,TIMESTAMPDIFF(SECOND,login_time,lead( login_time, 1 ) OVER ( PARTITION BY uid ORDER BY login_time )) AS 相差秒数
FROM lead_table ) AS e
WHERE 相差秒数 <= 120
GROUP BY uid;
- 计算次日留存率
其他窗口函数
- first_value(column):取分组排序后第一个值
SELECT s.sid,s1.sname,s1.gender,c.cname,s.num,
FIRST_VALUE(num) OVER(PARTITION by c.cname ORDER BY num DESC) AS first_value用法
FROM score AS s
JOIN student AS s1 ON s.student_id = s1.sid
LEFT JOIN course AS c ON s.course_id = c.cid
- last_value(column):取分组排序后最后一个值
SELECT s.sid,s1.sname,s1.gender,c.cname,s.num,
LAST_VALUE(num) OVER(PARTITION by c.cname ORDER BY num DESC) AS last_value用法
FROM score AS s
JOIN student AS s1 ON s.student_id = s1.sid
LEFT JOIN course AS c ON s.course_id = c.cid
为什么和想要的结果不一样呢?
实际上,窗口函数默认统计范围是rows between unbounded preceding and current row,也就是取当前行数据与当前行之前的数据的比较。
修改SQL,在order by条件的后面加上语句:rows between unbounded preceding and unbounded following,可以理解为:当前分组数据中的所有数据进行比较,取最后一条记录。
SELECT s.sid,s1.sname,s1.gender,c.cname,s.num,
LAST_VALUE(num) OVER(PARTITION by c.cname ORDER BY num DESC rows between unbounded preceding and unbounded following) AS last_value用法
FROM score AS s
JOIN student AS s1 ON s.student_id = s1.sid
LEFT JOIN course AS c ON s.course_id = c.cid
详细介绍:
- rows between XXX and XXX
- unbounded 无限制的
- preceding 分区的当前记录的向前偏移量
- current 当前
- following 分区的当前记录的向后偏移量
示例:累计计算每个月的销售额
-- 示例数据
CREATE TABLE sale (id INT PRIMARY KEY auto_increment,年份 INT,月份 INT,
money FLOAT ( 10, 2 ));
INSERT INTO sale ( 年份, 月份, money )
VALUES( 2020, 1, 5840 ),( 2020, 2, 5780 ),( 2020, 3, 4300 ),( 2020, 4, 4760 ),( 2020, 5, 3630 ),( 2020, 6, 4130 ),( 2020, 7, 4350 );-- 语句
SELECT *,sum( money ) over ( ORDER BY 月份rows between unbounded preceding and current row) AS 累计销售额
FROM sale;
本章示例数据
CREATE DATABASE school;
USE school;
CREATE TABLE class (cid INT ( 11 ) NOT NULL auto_increment,caption VARCHAR ( 32 ) NOT NULL,PRIMARY KEY ( cid )
) ENGINE = INNODB charset = utf8;INSERT INTO class
VALUES( 1, '三年二班' ),( 2, '三年三班' ),( 3, '二年二班' ),( 4, '一年二班' ),( 5, '二年五班' );CREATE TABLE teacher (tid INT ( 11 ) NOT NULL auto_increment,tname VARCHAR ( 32 ) NOT NULL,PRIMARY KEY ( tid )
) ENGINE = INNODB DEFAULT charset = utf8;INSERT INTO teacher
VALUES( 1, '张磊老师' ),( 2, '李平老师' ),( 3, '刘兰老师' ),( 4, '朱朱老师' ),( 5, '李杰老师' );CREATE TABLE course (cid INT ( 11 ) NOT NULL auto_increment,cname VARCHAR ( 32 ) NOT NULL,teacher_id INT ( 11 ) NOT NULL,PRIMARY KEY ( cid ),KEY fk_couurse_teacher ( teacher_id ),CONSTRAINT fk_course_teacher FOREIGN KEY ( teacher_id ) REFERENCES teacher ( tid )
) ENGINE = INNODB DEFAULT charset = utf8;
INSERT INTO course
VALUES( 1, '生物', 1 ),( 2, '物理', 2 ),( 3, '体育', 3 ),( 4, '美术', 2 );CREATE TABLE student (sid INT ( 11 ) NOT NULL auto_increment,gender CHAR ( 1 ) NOT NULL,class_id INT ( 11 ) NOT NULL,sname VARCHAR ( 32 ) NOT NULL,PRIMARY KEY ( sid ),KEY fk_class ( class_id ),CONSTRAINT fk_class FOREIGN KEY ( class_id ) REFERENCES class ( cid )
) ENGINE = INNODB DEFAULT charset = utf8;INSERT INTO student
VALUES( 1, '男', 1, '理解' ),( 2, '女', 1, '钢蛋' ),( 3, '男', 1, '张三' ),( 4, '男', 1, '张思' ),( 5, '女', 1, '网易' ),( 6, '男', 1, '王二' ),( 7, '男', 2, '铁道' ),( 8, '男', 2, '李武' ),( 9, '男', 2, '刘三' ),( 10, '女', 2, '刘一' ),( 11, '男', 2, '刘思' ),( 12, '男', 3, '王三' ),( 13, '男', 3, '小五' ),( 14, '男', 3, '小七' ),( 15, '女', 3, '如花' ),( 16, '男', 3, '张四' );CREATE TABLE score (sid INT ( 11 ) NOT NULL auto_increment,student_id INT ( 11 ) NOT NULL,course_id INT ( 11 ) NOT NULL,num INT ( 11 ) NOT NULL,PRIMARY KEY ( sid ),KEY fk_score_student ( student_id ),KEY fk_score_course ( course_id ),CONSTRAINT fk_score_course FOREIGN KEY ( course_id ) REFERENCES course ( cid ),CONSTRAINT fk_score_student FOREIGN KEY ( student_id ) REFERENCES student ( sid )
) ENGINE = INNODB DEFAULT charset = utf8;INSERT INTO score
VALUES( 1, 1, 1, 10 ),( 2, 1, 2, 9 ),( 5, 1, 4, 66 ),( 6, 2, 1, 8 ),( 8, 2, 3, 68 ),( 9, 2, 4, 99 ),( 10, 3, 1, 77 ),( 11, 3, 2, 66 ),( 12, 3, 3, 87 ),( 13, 3, 4, 99 ),( 14, 4, 1, 79 ),( 15, 4, 2, 11 ),( 16, 4, 3, 67 ),( 17, 4, 4, 100 ),( 18, 5, 1, 79 ),( 19, 5, 2, 11 ),( 20, 5, 3, 67 ),( 21, 5, 4, 100 );
来自: 学习MySQL(五):窗口函数