文章目录
- 测试数据
- 需求说明
- 需求实现
前言:本题制作参考牛客网进阶题目 —— SQL128 未完成试卷数大于1的有效用户
测试数据
-- 创建用户表
DROP TABLE IF EXISTS users;
CREATE TABLE users (user_id INT,name STRING,age INT,gender STRING,register_date STRING
);-- 插入用户数据
INSERT INTO users VALUES
(1, 'Alice', 23, 'F', '2023-01-01'),
(2, 'Bob', 22, 'M', '2023-02-01'),
(3, 'Cathy', 24, 'F', '2023-03-01'),
(4, 'David', 23, 'M', '2023-04-01'),
(5, 'Eve', 25, 'F', '2023-05-01'),
(6, 'Frank', 26, 'M', '2023-06-01'),
(7, 'Grace', 27, 'F', '2023-07-01'),
(8, 'Hank', 28, 'M', '2023-08-01'),
(9, 'Ivy', 29, 'F', '2023-09-01'),
(10, 'Jack', 30, 'M', '2023-10-01');-- 创建订单表
DROP TABLE IF EXISTS orders;
CREATE TABLE orders (order_id INT,user_id INT,product_id INT,order_date STRING,status STRING
);-- 插入订单数据
INSERT INTO orders VALUES
(101, 1, 1001, '2023-01-01', 'completed'),
(102, 1, 1002, '2023-01-01', 'pending'),
(103, 2, 1001, '2023-01-02', 'completed'),
(104, 3, 1001, '2023-01-03', 'pending'),
(105, 3, 1003, '2023-01-04', 'completed'),
(106, 4, 1002, '2023-01-02', 'completed'),
(107, 5, 1001, '2023-01-03', 'completed'),
(108, 5, 1002, '2023-01-04', 'pending'),
(109, 5, 1002, '2023-01-05', 'pending'),
(110, 5, 1003, '2023-01-06', 'pending'),
(111, 5, 1003, '2023-01-07', 'pending'),
(112, 6, 1001, '2023-01-08', 'completed'),
(113, 6, 1002, '2023-01-08', 'pending'),
(114, 6, 1003, '2023-01-08', 'pending'),
(115, 6, 1004, '2023-01-09', 'pending'),
(116, 6, 1005, '2023-01-10', 'completed'),
(117, 7, 1001, '2023-01-11', 'completed'),
(118, 7, 1002, '2023-01-11', 'pending'),
(119, 7, 1003, '2023-01-12', 'pending'),
(120, 7, 1004, '2023-01-12', 'pending'),
(121, 7, 1005, '2023-01-13', 'pending'),
(122, 8, 1001, '2023-01-14', 'completed'),
(123, 8, 1002, '2023-01-14', 'completed'),
(124, 8, 1003, '2023-01-15', 'completed'),
(125, 8, 1004, '2023-01-15', 'pending'),
(126, 8, 1005, '2023-01-16', 'pending'),
(127, 9, 1001, '2023-01-17', 'completed'),
(128, 9, 1002, '2023-01-17', 'completed'),
(129, 9, 1003, '2023-01-18', 'completed'),
(130, 9, 1004, '2023-01-18', 'completed'),
(131, 9, 1005, '2023-01-19', 'completed'),
(132, 10, 1001, '2023-01-20', 'pending'),
(133, 10, 1002, '2023-01-20', 'pending'),
(134, 10, 1003, '2023-01-21', 'pending'),
(135, 10, 1004, '2023-01-21', 'pending'),
(136, 10, 1005, '2023-01-22', 'pending');
需求说明
统计 2023
年每个有效用户的数据(有效用户指完成订单数至少为 1
且未完成订单数小于 5
),输出用户ID、用户名称、未完成订单数、完成订单数、购买过的商品ID集合,按用户ID升序排列。
orders
表中的 status
列标识用户订单的状态,共有两种:
-
pending
:未完成; -
completed
:已完成。
结果示例:
user_id | name | pending_orders | completed_orders | product_ids |
---|---|---|---|---|
1 | Alice | 1 | 1 | [“2023-01-01:1001”,“2023-01-01:1002”] |
2 | Bob | 0 | 1 | [“2023-01-02:1001”] |
3 | Cathy | 1 | 1 | [“2023-01-03:1001”,“2023-01-04:1003”] |
4 | David | 0 | 1 | [“2023-01-02:1002”] |
5 | Eve | 4 | 1 | [“2023-01-03:1001”,“2023-01-04:1002”,“2023-01-05:1002”,“2023-01-06:1003”,“2023-01-07:1003”] |
6 | Frank | 3 | 2 | [“2023-01-08:1001”,“2023-01-08:1002”,“2023-01-08:1003”,“2023-01-09:1004”,“2023-01-10:1005”] |
7 | Grace | 4 | 1 | [“2023-01-11:1001”,“2023-01-11:1002”,“2023-01-12:1003”,“2023-01-12:1004”,“2023-01-13:1005”] |
8 | Hank | 2 | 3 | [“2023-01-14:1001”,“2023-01-14:1002”,“2023-01-15:1003”,“2023-01-15:1004”,“2023-01-16:1005”] |
9 | Ivy | 0 | 5 | [“2023-01-17:1001”,“2023-01-17:1002”,“2023-01-18:1003”,“2023-01-18:1004”,“2023-01-19:1005”] |
其中:
user_id
:用户ID;name
:用户名;pending_orders
:未完成订单数;completed_orders
:完成订单数;product_ids
:每个用户下单的所有日期和产品ID组成的列表。
需求实现
selectu.user_id,name,pending_orders,completed_orders,product_ids
from(selectuser_id,sum(if(status = "pending",1,0)) pending_orders,sum(if(status = "completed",1,0)) completed_orders,collect_list(concat_ws(":",date_format(order_date,"yyyy-MM-dd"),cast(product_id as string))) product_idsfromorderswhereyear(order_date) = "2023"group byuser_id)t1joinusers uont1.user_id = u.user_id
wherecompleted_orders >= 1 and pending_orders < 5
order byu.user_id;
输出结果如下:
范围筛选统计的需求比较简单,只需要在分组的统计的时候进行判断即可。
本题稍有难度的地方在于,如何将各个用户的下单日期与对应的产品ID进行组合,形成列表,也就是列转行。
在 Hive 中列转行有两个函数:
-
collect_list
:传入一个参数(字段),根据分组,对该字段进行聚合,形成列表; -
collect_set
:和上面一样,但它的不同之处在于,会对组合的列表数据进行去重操作。
在 MySQL 中并没有这两个函数,但是有和它们功能类似的函数 group_concat
:
GROUP_CONCAT(expr SEPARATOR sep)-- 示例
group_concat(start_day SEPARATOR ';')
其中,expr
表示要连接的表达式,可以是列名、常量或者更复杂的表达式。SEPARATOR sep
是一个可选参数,用于指定连接字符串的分隔符,默认为逗号。