SQL挑战赛
第一期:
1: 编写一个查询,列出员工姓名列表,员工每月工资超过2000美元且员工工作时间少于10个月。通过提升employee_id对结果进行排序
select name from employee where salary > 2000 and months < 10 order by employee_id;
2: 查询 Employee 表格中以元音字母开头的 name 名字。结果不包含名字重复记录。
方法一:left函数
select distinct name from employee where left(name,1) in ("a","o","i""e","u")方法二:like模糊匹配
select distinct name from employee where name like '%a'||'%o'|| '%i'|| '%e'|| '%u'方法三:substr函数 SUBSTR (str, pos, len):由 <str> 中的第 <pos> 位置开始,选出接下去的 <len> 个字元。
select distinct name from employee where substr(name,1,1) in ("a","o","i""e","u")
3:编写一个查询,去掉一个最高收入,去掉一个最低收入,该公司员工平均收入是多少?
方法一:
select avg(salary) from employee
where
salary != (select max(salary) from employee)
And
salary !=(select min(salary) from employee);方法二:
select (sum(salary)-max(salary)-min(salary))/(count(*)-2) from employee;方法三:
select avg(salary) from employee t
where
salary not in (select max(salary) from employee
unionselect min(salary) from employee)
4:简述NULL, 空字符串""与 0的区别
NULL在数据库中表示没有这条记录
空字符串""为一个长度为0的字符串
0为数字0.
在count时,count(0) 与 count("")都会被聚合,但count(null)不会
第二期
数据表解释:
market_data表的字段介绍为:
order_id(订单ID),order_time(订单时间),
customer_name(用户名称),quantity(购买数量),
sale(销售额),profit(利润)
各项指标的定义为:
R值为:用户最后一次购买到现在(2016年12月31日)的时间间隔,输出月份。
L值为:用户第一次购买和最后一次购买之间的时间间隔,输出月份。
F值为:用户的总共购买次数,仅计算2016年的即可。
M值为:用户的全部销售额,仅计算2016年的即可。
1.查询所有用户的R值和L值。
#TIMESTAMPDIFF(interval,datetime_expr1,datetime_expr2)
返回日期或日期时间表达式datetime_expr1 和datetime_expr2the 之间的整数差。其结果的单位由interval 参数给出。#TIMESTAMPADD(interval,int_expr,datetime_expr)
将整型表达式int_expr 添加到日期或日期时间表达式 datetime_expr中。式中的interval和上文中列举的取值是一样的。#DATEDIFF(date1,date2)返回两个日期之间的天数
select customer_name,timestampdiff(month,max(order_time),'2016-13-31') as R,timestampdiff(month,min(order_time),max(order_time)) as L
from market_data
group bycustomer_name
2.查询用户的R值,F值和M值,注意F值和M值,仅计算2016年度的数字。
#IF(expr1,expr2,expr3)
如果 expr1 是TRUE (expr1 <> 0 and expr1 <> NULL),则 IF()的返回值为expr2;
否则返回值则为 expr3。IF() 的返回值为数字值或字符串值,具体情况视其所在语境而定。select customer_name,timestampdiff(month,max(order_time),'2016-13-31') as R,count(if(year(order_time)=2016,order_id,null)) as F,round(sum(if(year(order_time)=2016,sale,null)),2) as M
frommarket_data
Group bycustomer_name
3.查询用户的R值,L值和用户生命周期划分。生命周期划分如下:
(新用户:R<=6 and L<=12;忠诚用户:R<=6 and L>12;
流失的老用户:R>6 and L>12; 一次性用户:R>6 and L<=12)
select temp.*,
casewhen R<=6 and L<=12 then '新用户'when R<=6 and L>12 then '忠诚用户'when R>6 and L>12 then '流失的老用户'when R>6 and L<=12 then '一次性用户'
end as 用户生命周期
from(select customer_name,timestampdiff(month,max(order_time),'2016-13-31') as R,timestampdiff(month,min(order_time),max(order_time)) as L
from market_data
group bycustomer_name) as temp
第三期
Cinema表结构各字段介绍如下:
Seat_id(座位号,依次递增),free(0表示有人,1表示空座),fare(对应座位的票价)
题目为:
1:查找表中最便宜的票价是多少?
方法一:
select * from cinema
where free=1
order by fare
limit 1方法二
select min(fare)
from cinema
where free=1
2:女友要求你定的座位必须是连续的(输出可用位置的seat_id)
select c1.seat_id,c2.seat_id
from cinema c1- -两表连接(自连接)join cinema c2- -限制条件位连续座位on c1.seat_id+1 =c2.seat_id- -空闲座位
where c1.free=1 and c2.free=1;
3:女友要求买连续的座位中总价最低的两个座位(输出对应的seat_id和总价)
select c1.seat_id,c2.seat_id,c1.fare+c2.fare
from cinema c1- -两表连接(自连接)join cinema c2- -限制条件位连续座位on c1.seat_id+1 =c2.seat_id- -空闲座位
where c1.free=1 and c2.free=1- -对价格排序并限制输出一个
order by c1.fare+c2.fare limit 1
第四期
employ表内字段的解释如下:
position_name(职位名称),min_salary(最低薪资,单位元),
max_salary(最高薪资,单位元),city(工作城市),
educational(学历要求),people(招聘人数),industry(行业)
题目为:
1.查找不同学历要求的平均最低薪资和平均最高薪资;
select educational,round(avg(min_salary),round(avg(max_salary))
from employ
group by educational;
2.查找每个行业,最高工资和最低工资差距最大的职位名称。
selectin industry,position_name,max(max_salary-min_salary)
from employ
group by industry
3.查找各个城市电商行业的平均薪资水平,并按照薪资高低水平进行排序。(岗位的薪资用最低薪资加最高薪资除以二当成平均薪资计算即可,注意要考虑到职位招聘人数)
select city,round(sum((max_salary+min_salary)/2*people)/sum(people)) as 平均薪资
from employee
where industry='互联网/电子商务'
group by city
order by 平均薪资 desc;
4.问答题:说明UNION和UNION ALL的差别
都是做表的合并连结
union会删除重复值;union all 表中数据全部合并,忽略重复值
数据来源于某网站销售统计
- 网络订单数据
- 用户信息
分析步骤
0.数据导入
首先需要先创建对应的数据库和相应的表
- 创建orderinfo 表
2..创建userinfo表
#userinfo和orderinfo数据信息如下:
userinfo 客户信息表 userId 客户idsex 性别birth 出生年月日orderinfo 订单信息表 orderId 订单序号userId 客户idisPaid 是否支付price 商品价格paidTime 支付时间
- 登录mysql导入相应的数据
load data local infile "file" into table dbname.tablename ...
# 登录
mysql --local-infile -uroot -p
# 导入数据orderinfo
load data local infile 'F:BaiduNetdiskDownloadSQLorder_info_utf.csv' into table data.orderinfo fields terminated by ',';
# 导入数据userinfo
load data local infile 'F:BaiduNetdiskDownloadSQLuser_info_utf.csv' into table data.userinfo fields terminated by ',';
2.观察数据,对时间进行处理 ; 更新字符串为日期格式
update orderinfo set paidtime=replace(paidtime,'/','-') where paidtime is not null
update orderinfo set paidtime=str_to_date(paidtime,'%Y-%m-%d %H:%i') where paidtime is not null
3.查看数据
1.不同月份的下单人数
思路 :按月份进行分组,对用户进行去重统计
select month(paidTime) as dtmonth,
count(distinct userId) as count_users
from orderinfo
where isPaid = '已支付'
group by month(paidTime)
2 用户三月份的回购率和复购率
- 复购率 : 自然月内,购买多次的用户占比
- 首先先找出已支付中3月份的用户id和对应次数,按用户分组
- 然后再嵌套一层,复购率:购买次数大于1/ 总购买次数
select count(ct),count(if(ct>1,1,null)) from(select userID,Count(userId) as ct from orderinfowhere isPaid = '已支付'and month(paidTime) = 3group by userIdorder by userId) t
复购率: 16916 / 54799 = 0.308
- 回购率: 曾经购买过的用户在某一时期内再次购买的占比
首先先查询已支付userId ,和 支付月份的统计
select userId, date_format(paidTime, '%Y-%m-01') as m from orderinfowhere isPaid = '已支付'group by userId , date_format(paidTime,'%Y-%m-01')
然后使用date_sub函数,将表关联,筛选出本月的消费的userID,和下月的回购userID,即可计算出回购率
select t1.m,count(t1.m) as 消费总数,count(t2.m) as 复购率,count(t2.m)/ count(t1.m) as 回购率 from ( select userId, date_format(paidTime, '%Y-%m-01') as m from orderinfowhere isPaid = '已支付'group by userId , date_format(paidTime,'%Y-%m-01')) t1
left join ( select userId, date_format(paidTime, '%Y-%m-01') as m from orderinfowhere isPaid = '已支付'group by userId , date_format(paidTime,'%Y-%m-01')) t2
on t1.userId = t2.userId and t1.m = date_sub(t2.m, interval 1 month)
group by t1.m
3 统计男女用户的消费频次
- userinfo因为性别有空值,需要筛选出t orderinfo 再和表t连接 统计出用户男女消费次数
select o.userId,sex,count(o.userId)as ct from orderinfo oinner join(select * from userinfowhere sex != '') ton o.userId = t.userIdgroup by userId,sexorder by userId
- 根据上表,在进行子查询,统计出男性消费频次
select sex,avg(ct) from(select o.userId,sex,count(o.userId)as ct from orderinfo oinner join(select * from userinfowhere sex != '') ton o.userId = t.userIdgroup by userId,sexorder by userId)t2
group by sex
4 统计多次消费用户,分析第一次和最后一次的消费间隔
- 首先把多次消费的用户,和相应第一次最后一次消费时间提取出来
- 然后使用datediff 计算时间间隔,以天为单位
select userId,max(paidTime),min(paidTime),datediff(max(paidTime),min(paidTime)) from data.orderinfo
where isPaid = '已支付'
group by userId having count(1) > 1
order by userId
5 统计不同年龄段用户的消费金额差异
通过表联结,给用户划分不同的年龄段,以10年为基准,过滤出生日期为1900-00-00的异常值,筛选出用户消费频次和消费金额
select o.userId,age,price,count(o.userId)as ct from orderinfo o
inner join (select userId, ceil((year(now()) - year(birth))/10) as agefrom userinfowhere birth > 1901-00-00) t
on o.userId = t.userId
where isPaid = '已支付'
group by userId
order by userId
统计出年龄段的消费频次和消费金额
select t2.age,avg(ct),avg(price) from (select o.userId,age,price,count(o.userId)as ct from orderinfo o inner join(select userId, ceil((year(now()) - year(birth))/10) as agefrom userinfowhere birth > 1901-00-00)ton o.userId = t.userIdwhere ispaid = '已支付'group by userId, age) t2
group by age
order by age
- ceil : 向上取整
6 统计消费的二八法则:消费top20%的用户贡献了多少消费额度
按照用户消费总额排序
select userId,sum(price) as total from orderinfo o
where isPaid = '已支付'
group by userId
order by total desc
查看总用户数和总金额
select count(userId),sum(total) from (select userId,sum(price) as total from orderinfo owhere isPaid = '已支付'group by userIdorder by total desc) as t
查看前20%的用户数量有多少
select count(userId)*0.2,sum(total) from (select userId,sum(price) as total from orderinfo owhere isPaid = '已支付'group by userIdorder by total desc)as t
limit限制前17000用户
select count(userId),sum(total) from (
select userId,sum(price) as total from orderinfo o
where isPaid = '已支付'
group by userId
order by total desc
limit 17129) t
top20%用户的消费总额占比情况:top20%用户的消费总额/所有用户的消费总额=73.93%
top20%的用户贡献了73.93%消费额度。