mysql load data infile 导入数据某一列空_Sql数据挑战赛amp;网络销售案例分析

SQL挑战赛

第一期:

1: 编写一个查询，列出员工姓名列表，员工每月工资超过2000美元且员工工作时间少于10个月。通过提升employee_id对结果进行排序

select name from employee where salary > 2000 and months < 10 order by employee_id;

2: 查询 Employee 表格中以元音字母开头的 name 名字。结果不包含名字重复记录。

方法一:left函数
select distinct name from employee where left(name,1) in ("a","o","i""e","u")方法二:like模糊匹配
select distinct name from employee where name like '%a'||'%o'|| '%i'|| '%e'|| '%u'方法三:substr函数  SUBSTR (str, pos, len):由 <str> 中的第 <pos> 位置开始，选出接下去的 <len> 个字元。
select distinct name from employee where substr(name,1,1) in ("a","o","i""e","u")

3:编写一个查询，去掉一个最高收入，去掉一个最低收入，该公司员工平均收入是多少？

方法一:
select avg(salary) from employee 
where 
salary != (select max(salary) from employee)  
And
salary !=(select min(salary) from employee);方法二:
select (sum(salary)-max(salary)-min(salary))/(count(*)-2) from employee;方法三:
select avg(salary) from employee t
where 
salary not in  (select max(salary) from employee
unionselect min(salary) from employee)

4：简述NULL, 空字符串""与 0的区别

NULL在数据库中表示没有这条记录

空字符串""为一个长度为0的字符串

0为数字0.

在count时,count(0) 与 count("")都会被聚合,但count(null)不会

第二期

数据表解释:

market_data表的字段介绍为：

order_id(订单ID),order_time(订单时间),

customer_name(用户名称),quantity(购买数量），

sale(销售额），profit(利润)

各项指标的定义为：

R值为：用户最后一次购买到现在（2016年12月31日）的时间间隔，输出月份。

L值为：用户第一次购买和最后一次购买之间的时间间隔，输出月份。

F值为：用户的总共购买次数，仅计算2016年的即可。

M值为：用户的全部销售额，仅计算2016年的即可。

1.查询所有用户的R值和L值。

#TIMESTAMPDIFF(interval,datetime_expr1,datetime_expr2)
返回日期或日期时间表达式datetime_expr1 和datetime_expr2the 之间的整数差。其结果的单位由interval 参数给出。#TIMESTAMPADD(interval,int_expr,datetime_expr)
将整型表达式int_expr 添加到日期或日期时间表达式 datetime_expr中。式中的interval和上文中列举的取值是一样的。#DATEDIFF(date1,date2)返回两个日期之间的天数
select customer_name,timestampdiff(month,max(order_time),'2016-13-31') as R,timestampdiff(month,min(order_time),max(order_time)) as L
from market_data
group bycustomer_name

2.查询用户的R值，F值和M值，注意F值和M值，仅计算2016年度的数字。

#IF(expr1,expr2,expr3)
如果 expr1 是TRUE (expr1 <> 0 and expr1 <> NULL)，则 IF()的返回值为expr2; 
否则返回值则为 expr3。IF() 的返回值为数字值或字符串值，具体情况视其所在语境而定。select customer_name,timestampdiff(month,max(order_time),'2016-13-31') as R,count(if(year(order_time)=2016,order_id,null)) as F,round(sum(if(year(order_time)=2016,sale,null)),2) as M
frommarket_data
Group bycustomer_name

3.查询用户的R值，L值和用户生命周期划分。生命周期划分如下：

（新用户：R<=6 and L<=12；忠诚用户：R<=6 and L>12;

流失的老用户：R>6 and L>12; 一次性用户：R>6 and L<=12）

select temp.*,
casewhen R<=6 and L<=12 then '新用户'when R<=6 and L>12 then '忠诚用户'when R>6 and L>12 then '流失的老用户'when R>6 and L<=12 then '一次性用户'
end as 用户生命周期
from(select customer_name,timestampdiff(month,max(order_time),'2016-13-31') as R,timestampdiff(month,min(order_time),max(order_time)) as L
from market_data
group bycustomer_name) as temp

第三期

Cinema表结构各字段介绍如下：

Seat_id（座位号，依次递增），free（0表示有人，1表示空座），fare(对应座位的票价）

题目为：

1：查找表中最便宜的票价是多少？

方法一:
select * from cinema 
where free=1 
order by fare
limit 1方法二
select min(fare)
from cinema
where free=1

2：女友要求你定的座位必须是连续的（输出可用位置的seat_id）

select c1.seat_id,c2.seat_id
from cinema c1- -两表连接(自连接)join cinema c2- -限制条件位连续座位on c1.seat_id+1 =c2.seat_id- -空闲座位
where c1.free=1 and c2.free=1;

3：女友要求买连续的座位中总价最低的两个座位(输出对应的seat_id和总价)

select c1.seat_id,c2.seat_id,c1.fare+c2.fare
from cinema c1- -两表连接(自连接)join cinema c2- -限制条件位连续座位on c1.seat_id+1 =c2.seat_id- -空闲座位
where c1.free=1 and c2.free=1- -对价格排序并限制输出一个
order by c1.fare+c2.fare limit 1

第四期

employ表内字段的解释如下：

position_name（职位名称），min_salary（最低薪资，单位元），

max_salary（最高薪资，单位元），city（工作城市），

educational（学历要求），people（招聘人数），industry（行业）

题目为：

1.查找不同学历要求的平均最低薪资和平均最高薪资；

select educational,round(avg(min_salary),round(avg(max_salary))
from employ
group by educational;

2.查找每个行业，最高工资和最低工资差距最大的职位名称。

selectin industry,position_name,max(max_salary-min_salary)
from employ
group by industry

3.查找各个城市电商行业的平均薪资水平，并按照薪资高低水平进行排序。（岗位的薪资用最低薪资加最高薪资除以二当成平均薪资计算即可，注意要考虑到职位招聘人数）

select city,round(sum((max_salary+min_salary)/2*people)/sum(people)) as 平均薪资
from employee
where industry='互联网/电子商务'
group by city
order by 平均薪资 desc;

4.问答题：说明UNION和UNION ALL的差别

都是做表的合并连结

union会删除重复值;union all 表中数据全部合并,忽略重复值

数据来源于某网站销售统计

网络订单数据
用户信息

提取码:3k6ipan.baidu.com

分析步骤

0.数据导入

首先需要先创建对应的数据库和相应的表

创建orderinfo 表

2..创建userinfo表

#userinfo和orderinfo数据信息如下：
userinfo  客户信息表       userId    客户idsex       性别birth     出生年月日orderinfo 订单信息表       orderId   订单序号userId    客户idisPaid    是否支付price     商品价格paidTime  支付时间

登录mysql导入相应的数据

load data local infile "file" into table dbname.tablename ...

# 登录
mysql --local-infile -uroot -p
# 导入数据orderinfo
load data local  infile 'F:BaiduNetdiskDownloadSQLorder_info_utf.csv' into table data.orderinfo fields terminated by ',';
# 导入数据userinfo
load data local  infile 'F:BaiduNetdiskDownloadSQLuser_info_utf.csv' into table data.userinfo fields terminated by ',';

2.观察数据，对时间进行处理 ; 更新字符串为日期格式

update orderinfo set paidtime=replace(paidtime,'/','-') where paidtime is not null
update orderinfo set paidtime=str_to_date(paidtime,'%Y-%m-%d %H:%i') where paidtime is not null

3.查看数据

1.不同月份的下单人数

思路：按月份进行分组，对用户进行去重统计

select month(paidTime) as dtmonth, 
count(distinct userId) as count_users
from orderinfo
where isPaid = '已支付'
group by month(paidTime)

2 用户三月份的回购率和复购率

复购率：自然月内，购买多次的用户占比

首先先找出已支付中3月份的用户id和对应次数，按用户分组
然后再嵌套一层，复购率：购买次数大于1/ 总购买次数

select count(ct),count(if(ct>1,1,null)) from(select userID,Count(userId) as ct from orderinfowhere isPaid = '已支付'and month(paidTime) = 3group by userIdorder by userId) t

复购率： 16916 / 54799 = 0.308

回购率：曾经购买过的用户在某一时期内再次购买的占比

首先先查询已支付userId ,和支付月份的统计

select userId, date_format(paidTime, '%Y-%m-01') as m from orderinfowhere isPaid = '已支付'group by userId , date_format(paidTime,'%Y-%m-01')

然后使用date_sub函数，将表关联，筛选出本月的消费的userID，和下月的回购userID，即可计算出回购率

select t1.m,count(t1.m) as 消费总数,count(t2.m) as 复购率,count(t2.m)/ count(t1.m) as 回购率 from ( select userId, date_format(paidTime, '%Y-%m-01') as m from orderinfowhere isPaid = '已支付'group by userId , date_format(paidTime,'%Y-%m-01')) t1
left join ( select userId, date_format(paidTime, '%Y-%m-01') as m from orderinfowhere isPaid = '已支付'group by userId , date_format(paidTime,'%Y-%m-01')) t2
on t1.userId = t2.userId and t1.m = date_sub(t2.m, interval 1 month)
group by t1.m

3 统计男女用户的消费频次

userinfo因为性别有空值，需要筛选出t orderinfo 再和表t连接统计出用户男女消费次数

select o.userId,sex,count(o.userId)as ct from orderinfo oinner join(select * from userinfowhere sex != '') ton o.userId = t.userIdgroup by userId,sexorder by userId

根据上表，在进行子查询，统计出男性消费频次

select sex,avg(ct) from(select o.userId,sex,count(o.userId)as ct from orderinfo oinner join(select * from userinfowhere sex != '') ton o.userId = t.userIdgroup by userId,sexorder by userId)t2
group by sex

4 统计多次消费用户，分析第一次和最后一次的消费间隔

首先把多次消费的用户,和相应第一次最后一次消费时间提取出来
然后使用datediff 计算时间间隔，以天为单位

select userId,max(paidTime),min(paidTime),datediff(max(paidTime),min(paidTime)) from data.orderinfo
where isPaid = '已支付'
group by userId having count(1) > 1
order by userId

5 统计不同年龄段用户的消费金额差异

通过表联结，给用户划分不同的年龄段，以10年为基准，过滤出生日期为1900-00-00的异常值,筛选出用户消费频次和消费金额

select o.userId,age,price,count(o.userId)as ct from orderinfo o
inner join (select userId, ceil((year(now()) - year(birth))/10) as agefrom userinfowhere birth > 1901-00-00) t
on o.userId = t.userId
where isPaid = '已支付'
group by userId
order by userId

统计出年龄段的消费频次和消费金额

select t2.age,avg(ct),avg(price) from (select o.userId,age,price,count(o.userId)as ct from orderinfo o inner join(select userId, ceil((year(now()) - year(birth))/10) as agefrom userinfowhere birth > 1901-00-00)ton o.userId = t.userIdwhere ispaid = '已支付'group by userId, age) t2
group by age
order by age

ceil : 向上取整

6 统计消费的二八法则：消费top20%的用户贡献了多少消费额度

按照用户消费总额排序

select userId,sum(price) as total from orderinfo o
where isPaid = '已支付'
group by userId
order by total desc

查看总用户数和总金额

select count(userId),sum(total) from (select userId,sum(price) as total from orderinfo owhere isPaid = '已支付'group by userIdorder by total desc) as t

查看前20%的用户数量有多少

select count(userId)*0.2,sum(total) from (select userId,sum(price) as total from orderinfo owhere isPaid = '已支付'group by userIdorder by total desc)as t

limit限制前17000用户

select count(userId),sum(total) from (
select userId,sum(price) as total from orderinfo o
where isPaid = '已支付'
group by userId
order by total desc
limit 17129) t