一、窗口函数:数据分析的利器
1. 窗口函数基础概念
窗口函数(Window Function)是MySQL 8.0引入的强大特性,它可以在不减少行数的情况下对数据进行聚合计算和分析
SELECT employee_name,department,salary,RANK() OVER (PARTITION BY department ORDER BY salary DESC) as dept_rank
FROM employees;
核心组件:
PARTITION BY:定义窗口分区(类似GROUP BY但不聚合)
ORDER BY:确定窗口内行的排序
frame_clause:定义窗口框架(ROWS/RANGE BETWEEN)
2. 常用窗口函数分类
排名函数
ROW_NUMBER():连续编号(1,2,3…)
RANK():并列排名会跳过后续序号(1,2,2,4…)
DENSE_RANK():并列排名不跳号(1,2,2,3…)
聚合函数
SUM()/AVG()/COUNT()/MIN()/MAX() OVER()
分布函数
PERCENT_RANK():相对排名百分比
CUME_DIST():累积分布值
前后函数
LAG(column, n):访问前n行数据
LEAD(column, n):访问后n行数据
FIRST_VALUE()/LAST_VALUE():窗口首尾值
3.高级窗口框架控制
SELECT date,revenue,AVG(revenue) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) as moving_avg
FROM sales;
框架类型:
ROWS:物理行偏移
RANGE:逻辑值范围
GROUPS:MySQL 8.0.2+支持,按组偏移
二、Common Table Expressions (CTE):提升查询可读性
基础CTE语法
WITH department_stats AS (SELECT department,AVG(salary) as avg_salary,COUNT(*) as emp_countFROM employeesGROUP BY department
)
SELECT * FROM department_stats WHERE avg_salary > 5000;
递归CTE实现层次查询
WITH RECURSIVE org_hierarchy AS (-- 基础查询(锚成员)SELECT id, name, manager_id, 1 as levelFROM employeesWHERE manager_id IS NULLUNION ALL-- 递归查询(递归成员)SELECT e.id, e.name, e.manager_id, h.level + 1FROM employees eJOIN org_hierarchy h ON e.manager_id = h.id
)
SELECT * FROM org_hierarchy;
应用场景:
组织结构图
产品分类树
社交网络关系
CTE优化技巧
MATERIALIZED:强制物化CTE结果
MERGE:将CTE合并到主查询
限制递归深度:SET @@cte_max_recursion_depth = 100;
三、高级JSON处理:应对半结构化数据
JSON创建与修改
-- 创建JSON
SELECT JSON_OBJECT('name', name, 'salary', salary) as emp_json
FROM employees;-- 修改JSON
UPDATE products
SET attributes = JSON_SET(attributes, '$.color', 'blue')
WHERE id = 1001;
JSON路径查询
SELECT product_id,JSON_EXTRACT(attributes, '$.dimensions.width') as width,attributes->>'$.manufacturer' as manufacturer
FROM products
WHERE JSON_CONTAINS(attributes, '"wireless"', '$.features');
JSON聚合函数
SELECT department,JSON_ARRAYAGG(JSON_OBJECT('id', id, 'name', name)) as employees
FROM staff
GROUP BY department;
四、索引优化高级技巧
函数索引(MySQL 8.0+)
-- 创建基于表达式的索引
CREATE INDEX idx_name_lower ON employees ((LOWER(name)));-- 使用时必须完全匹配索引表达式
SELECT * FROM employees WHERE LOWER(name) = 'john';
不可见索引
-- 创建不可见索引(优化器忽略)
CREATE INDEX idx_temp ON orders (customer_id) INVISIBLE;-- 测试后决定是否可见
ALTER TABLE orders ALTER INDEX idx_temp VISIBLE;
降序索引优化
-- 创建降序索引
CREATE INDEX idx_created_desc ON log_entries (created_at DESC);-- 适合ORDER BY ... DESC查询
SELECT * FROM log_entries ORDER BY created_at DESC LIMIT 100;
五、高级事务处理
保存点(Savepoint)控制
START TRANSACTION;
INSERT INTO orders (...) VALUES (...);
SAVEPOINT order_created;
UPDATE inventory SET quantity = quantity - 1;
-- 发生错误时
ROLLBACK TO SAVEPOINT order_created;
COMMIT;
多版本并发控制(MVCC)深度优化
-- 使用特定隔离级别
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;-- 优化长时间事务
SET TRANSACTION READ ONLY;
锁优化策略
-- 行锁升级为表锁(慎用)
LOCK TABLES orders WRITE;-- 使用SKIP LOCKED处理高并发
SELECT * FROM jobs
WHERE status = 'pending'
ORDER BY priority DESC
LIMIT 1 FOR UPDATE SKIP LOCKED;
六、性能分析高级技术
执行计划深度解读
EXPLAIN FORMAT=JSON
SELECT * FROM orders WHERE customer_id IN (SELECT id FROM customers WHERE region = 'APAC'
);-- 关键指标分析
/*
"cost_info": {"query_cost": "10.25" -- 总预估成本
},
"table_scan": {"rows_examined_per_scan": 1000,"rows_produced_per_join": 100,"filtered": "10.00"
}
*/
优化器提示(Optimizer Hints)
SELECT /*+ INDEX(orders idx_customer) */ *
FROM orders FORCE INDEX (idx_customer)
WHERE customer_id = 1001;
性能模式(Performance Schema)监控
-- 分析最耗资源的SQL
SELECT * FROM performance_schema.events_statements_summary_by_digest
ORDER BY SUM_TIMER_WAIT DESC LIMIT 10;-- 查看锁等待
SELECT * FROM performance_schema.events_waits_current
WHERE EVENT_NAME LIKE '%lock%';
七、实战案例:电商数据分析系统
用户购买路径分析
WITH user_journey AS (SELECT user_id,event_time,event_type,LAG(event_type, 1) OVER (PARTITION BY user_id ORDER BY event_time) as prev_event,LEAD(event_type, 1) OVER (PARTITION BY user_id ORDER BY event_time) as next_eventFROM user_eventsWHERE event_date = CURDATE()
)
SELECT prev_event,event_type,next_event,COUNT(*) as transition_count
FROM user_journey
GROUP BY prev_event, event_type, next_event
ORDER BY transition_count DESC;
实时库存预警
WITH inventory_status AS (SELECT product_id,current_stock,AVG(current_stock) OVER (PARTITION BY category_id) as category_avg,RANK() OVER (PARTITION BY warehouse_id ORDER BY current_stock) as stock_rankFROM inventory
)
SELECT product_id, current_stock
FROM inventory_status
WHERE current_stock < (0.2 * category_avg) ORstock_rank <= 5; -- 每个仓库库存最低的5个商品
建议
渐进式优化:先确保SQL正确性,再逐步应用高级优化
测试验证:所有优化必须通过真实数据验证
监控迭代:持续监控执行计划变化
适度使用:避免过度复杂化SQL逻辑
版本特性:充分利用MySQL 8.0+的新特性