一、背景
需求:求【昨日触达】、【本周拜访】、【本月活动数】。
背景:三个维度的数据都按日统计在一张表里。
首先想到的就是left join +group by。
1.1 优化前语句
selectuser_id,user_name,org_id,org_name,sum(reach_num) dayReachNum,sum(visit_num) weekVisitNum,sum(activity_num) mouthActivityNum
from(selectsrsm.user_id,srsm.user_name,srsm.org_id,srsm.org_name,srsd.reach_num,srsw.visit_num,srsm.activity_numfromstandard_reach_statistics srsmleft join standard_reach_statistics srsd onsrsd.user_id = srsm.user_idand srsd.statistical_date between '2024-01-01' and '2024-01-01'left join standard_reach_statistics srsw onsrsm.user_id = srsw.user_idand srsw.statistical_date between '2024-01-01' and '2024-01-07'wheresrsm.statistical_date between '2024-01-01' and '2024-01-31') tt
group byuser_id,user_name,org_id,org_name
速度太慢,standard_reach_statistics表总的数据量也不过是3w条,需要50s才能跑完,这在系统中肯定是不被允许的。
分析这条语句存在的问题:
- 大表驱动小表;
- 三张3w的表进行left join,产生的连接次数为3w*3w,已经接近9亿了,这是万万不能接受的;
- 连表字段的索引已经建,这条没有问题;
二、优化后语句
思路:
- 解决后两个left join表数据过大问题:可以先把后两个left join语句sum出来,在left join;
selectsrsm.user_id,reach_num_sum dayReachNum,visit_num_sum weekVisitNum ,sum(srsm.activity_num) mouthActivityNum
fromstandard_reach_statistics srsm
left join (selectsrsd.user_id,sum(srsd.reach_num) reach_num_sumfromstandard_reach_statistics srsdwheresrsd.statistical_date between '2024-01-01' and '2024-01-01'group byuser_id ) sum_day onsrsm.user_id = sum_day.user_id
left join (selectsrsw.user_id,sum(srsw.visit_num)visit_num_sumfromstandard_reach_statistics srswwheresrsw.statistical_date between '2024-01-01' and '2024-01-07'group byuser_id ) sum_month onsrsm.user_id = sum_month.user_id
wheresrsm.statistical_date between '2024-01-01' and '2024-01-31'
group byuser_id;
优化后效率由原来的50s,变成了86ms,效率提升明显。