背景介绍
业务介绍:用户是通过低价引流进来,然后通过复购购买高价商品,可以多次购买。低价商品和高价商品均可以退款,高价商品由于各种复杂的场景,可能会有多次退款。低价商品如果退款是全退,不存在多次退款。
业务需求:计算低价引流的流量后续的产出,复购的期限是在 30天内,退款的期限也是按 30天(退款时间-复购时间),最终交付一个按引流订单维度。
架构设计
由于引流订单和复购订单的性质不同,在数仓贴源层处理的过程中,已经将二者分离独立成表,此处是在贴源层的基础上加以处理。
将引流订单记录、复购订单记录和退款订单记录以引流订单为主表整合到一起,输出不加业务逻辑的事实表:订单链路数据表。在此基础上,再根据不同的业务逻辑处理为不同聚合表单。
数据开发
说明:为方便测试,以下使用的是 MySQL 语法,实际开发使用的是阿里云的 MaxCompute SQL,本代码的差异点在于日期函数
date_add()
,在 MaxCompute SQL 中,语法有一定差异,后者是date_add(<date_col_name>, 30)
,如果时间字段包含日期和时间,则要使用dateadd(<datetime_col_name>,30,'dd')
。
订单链路数据表好处理,把三个表根据用户信息进行 JOIN 即可,伪代码参考如下:
select *
from <引流订单记录>
left join <复购订单记录> on <用户信息>
left join <退款订单记录> on <用户信息>
有了订单链路数据表,接下来按 30天的复购和30天的退款期限聚合为引流订单产出表。
抽象出一个表单,数据记录如下,有 10 个字段
- low_price_order_id:低价订单号
- low_price_paid_time:低价订单付款时间
- low_price_paid_amount:低价订单付款金额
- low_price_refund_time:低价订单退款时间
- low_price_refund_amount:低价订单退款金额
- high_price_order_id:高价订单号
- high_price_paid_time:高价订单付款时间
- high_price_paid_amount:高价订单付款金额
- high_price_refund_time:高价订单退款时间
- high_price_refund_amount:高价订单退款金额
with user_orders as(
-- 只有低价订单
select '10001' as "low_price_order_id",'2024-01-01' as "low_price_paid_time",1.0 as "low_price_paid_amount",null as "low_price_refund_time",0 as "low_price_refund_amount",null as "high_price_order_id",null as "high_price_paid_time",0 as "high_price_paid_amount",null as "high_price_refund_time",0 as "high_price_refund_amount"
union all
-- 只有低价订单,且退款
select '10002' as "low_price_order_id",'2024-01-01' as "low_price_paid_time",1.0 as "low_price_paid_amount",'2024-01-02' as "low_price_refund_time",1.0 as "low_price_refund_amount",null as "high_price_order_id",null as "high_price_paid_time",0 as "high_price_paid_amount",null as "high_price_refund_time",0 as "high_price_refund_amount"
union all
-- 低价订单+一个高价订单
select '10003' as "low_price_order_id",'2024-01-01' as "low_price_paid_time",1.0 as "low_price_paid_amount",null as "low_price_refund_time",0 as "low_price_refund_amount",'20001' as "high_price_order_id",'2024-01-01' as "high_price_paid_time",1000 as "high_price_paid_amount",null as "high_price_refund_time",0 as "high_price_refund_amount"
union all
-- 低价订单+2个高价订单
select '10004' as "low_price_order_id",'2024-01-01' as "low_price_paid_time",1.0 as "low_price_paid_amount",null as "low_price_refund_time",0 as "low_price_refund_amount",'20002' as "high_price_order_id",'2024-01-02' as "high_price_paid_time",2000 as "high_price_paid_amount",null as "high_price_refund_time",0 as "high_price_refund_amount"
union allselect '10004' as "low_price_order_id",'2024-01-02' as "low_price_paid_time",1.0 as "low_price_paid_amount",null as "low_price_refund_time",0 as "low_price_refund_amount",'20004' as "high_price_order_id",'2024-01-05' as "high_price_paid_time",1000 as "high_price_paid_amount",null as "high_price_refund_time",0 as "high_price_refund_amount"
union all
-- 低价订单+1个30天内高价订单+1个30天外高价订单
select '10005' as "low_price_order_id",'2024-01-02' as "low_price_paid_time",1.0 as "low_price_paid_amount",null as "low_price_refund_time",0 as "low_price_refund_amount",'20003' as "high_price_order_id",'2024-01-02' as "high_price_paid_time",2000 as "high_price_paid_amount",null as "high_price_refund_time",0 as "high_price_refund_amount"
union allselect '10005' as "low_price_order_id",'2024-01-02' as "low_price_paid_time",1.0 as "low_price_paid_amount",null as "low_price_refund_time",0 as "low_price_refund_amount",'20010' as "high_price_order_id",'2024-02-04' as "high_price_paid_time",1500 as "high_price_paid_amount",null as "high_price_refund_time",0 as "high_price_refund_amount"
union all-- 低价订单+一个高价订单且一个退款
select '10006' as "low_price_order_id",'2024-01-02' as "low_price_paid_time",1.0 as "low_price_paid_amount",null as "low_price_refund_time",0 as "low_price_refund_amount",'20005' as "high_price_order_id",'2024-01-03' as "high_price_paid_time",2000 as "high_price_paid_amount",'2024-01-05' as "high_price_refund_time",2000 as "high_price_refund_amount"
union all
-- 低价订单+一个高价订单且2个退款
select '10007' as "low_price_order_id",'2024-01-02' as "low_price_paid_time",1.0 as "low_price_paid_amount",null as "low_price_refund_time",0 as "low_price_refund_amount",'20006' as "high_price_order_id",'2024-01-03' as "high_price_paid_time",2000 as "high_price_paid_amount",'2024-01-03' as "high_price_refund_time",1000 as "high_price_refund_amount"
union allselect '10007' as "low_price_order_id",'2024-01-02' as "low_price_paid_time",1.0 as "low_price_paid_amount",null as "low_price_refund_time",0 as "low_price_refund_amount",'20006' as "high_price_order_id",'2024-01-03' as "high_price_paid_time",2000 as "high_price_paid_amount",'2024-01-04' as "high_price_refund_time",1000 as "high_price_refund_amount"
union all
-- 低价订单+一个高价订单且1个30天内退款1个30天外退款
select '10008' as "low_price_order_id",'2024-01-03' as "low_price_paid_time",1.0 as "low_price_paid_amount",null as "low_price_refund_time",0 as "low_price_refund_amount",'20007' as "high_price_order_id",'2024-01-03' as "high_price_paid_time",2000 as "high_price_paid_amount",'2024-01-05' as "high_price_refund_time",100 as "high_price_refund_amount"
union allselect '10008' as "low_price_order_id",'2024-01-03' as "low_price_paid_time",1.0 as "low_price_paid_amount",null as "low_price_refund_time",0 as "low_price_refund_amount",'20007' as "high_price_order_id",'2024-01-03' as "high_price_paid_time",2000 as "high_price_paid_amount",'2024-02-05' as "high_price_refund_time",1900 as "high_price_refund_amount"
union all
-- 低价订单+1个30天内高价订单—+1个30天外高价订单且该单有30天内退款和30天外退款
select '10009' as "low_price_order_id",'2024-01-03' as "low_price_paid_time",1.0 as "low_price_paid_amount",null as "low_price_refund_time",0 as "low_price_refund_amount",'20008' as "high_price_order_id",'2024-01-03' as "high_price_paid_time",1500 as "high_price_paid_amount",null as "high_price_refund_time",0 as "high_price_refund_amount"
union allselect '10009' as "low_price_order_id",'2024-01-03' as "low_price_paid_time",1.0 as "low_price_paid_amount",null as "low_price_refund_time",0 as "low_price_refund_amount",'20009' as "high_price_order_id",'2024-02-04' as "high_price_paid_time",2000 as "high_price_paid_amount",'2024-02-05' as "high_price_refund_time",100 as "high_price_refund_amount"
union allselect '10009' as "low_price_order_id",'2024-01-03' as "low_price_paid_time",1.0 as "low_price_paid_amount",null as "low_price_refund_time",0 as "low_price_refund_amount",'20009' as "high_price_order_id",'2024-02-04' as "high_price_paid_time",2000 as "high_price_paid_amount",'2024-03-06' as "high_price_refund_time",1900 as "high_price_refund_amount"
)
select *
from user_orders;
数据参考如下:
- 部分只有低价订单,可能有退款,退款可能是 30天内,可能是 30天外;
- 部分有高价订单,高价订单可能是 30天内,可能是 30天外;
- 部分有多笔高价订单,每笔可能是 30天内,可能是 30天外;
- 部分高价订单有退款,退款可能是 30天内,可能是 30天外;
- 部分高价订单有多笔退款,每笔退款可能是 30天内,可能是 30天外。
目标表字段记录如下:
- low_price_order_id:低价订单号
- low_price_paid_time:低价订单付款时间
- low_price_paid_amount:低价订单付款金额
- low_price_refund_amount:30天内低价订单退款金额
- high_price_paid_amount:30天内高价订单付款金额
- high_price_refund_amount:30天内高价订单退款金额
接下来开始聚合目标表。
保留低价订单唯一记录,那就直接按照低价订单聚合?
逻辑上没问题,直接聚合试试。
不加日期限制时,结构如下:
select uos.low_price_order_id,uos.low_price_paid_time,uos.low_price_paid_amount,sum(uos.low_price_refund_amount),sum(uos.high_price_paid_amount),sum(uos.high_price_refund_amount)
from user_orders uos
group by uos.low_price_order_id,uos.low_price_paid_time,uos.low_price_paid_amount
按 30 天周期限制,对每个聚合字段进行界限判断:
select uos.low_price_order_id,uos.low_price_paid_time,uos.low_price_paid_amount,sum(case when uos.low_price_refund_time<=date_add(uos.low_price_paid_time,interval 30 day) then uos.low_price_refund_amount else 0 end) low_price_refund_amount,sum(case when uos.high_price_paid_time<=date_add(uos.low_price_paid_time,interval 30 day) then uos.high_price_paid_amount else 0 end) high_price_paid_amount,sum(case when uos.high_price_paid_time<=date_add(uos.low_price_paid_time,interval 30 day) and uos.high_price_refund_time<=date_add(uos.high_price_paid_time,interval 30 day) then uos.high_price_refund_amount else 0 end) high_price_refund_amount
from user_orders uos
group by uos.low_price_order_id,uos.low_price_paid_time,uos.low_price_paid_amount
查看结果,显然不行!当高价订单有多笔退款时,数据发散了,直接聚合时,会出现翻倍的异常。
既然高价订单发散了,为了保证唯一记录,需要分两段聚合,先按低价订单和高价订单聚合退款数据,然后再按低价订单聚合高价订单金额和退款金额,数据流转参考如下。
不加时间限制时,参考如下:
select uos.low_price_order_id,uos.low_price_paid_amount,uos.low_price_paid_time,sum(uos.low_price_refund_amount) low_price_refund_amount,sum(uos.high_price_paid_amount) high_price_paid_amount,sum(uos.high_price_refund_amount) high_price_refund_amount
from(select uos.low_price_order_id,uos.low_price_paid_amount,uos.low_price_paid_time,uos.low_price_refund_amount,uos.low_price_refund_time,uos.high_price_order_id,uos.high_price_paid_time,uos.high_price_paid_amount,sum(uos.high_price_refund_amount) high_price_refund_amountfrom user_orders uosgroup by uos.low_price_order_id,uos.low_price_paid_amount,uos.low_price_paid_time,uos.low_price_refund_amount,uos.low_price_refund_time,uos.high_price_order_id,uos.high_price_paid_time,uos.high_price_paid_amount
)uos
group by uos.low_price_order_id,uos.low_price_paid_amount,uos.low_price_paid_time
接下来加上限制。
先按低价订单和高价订单聚合退款数据,把高价订单的退款时间减去高价订单的付款时间在30天内的退款金额聚合。
然后再按低价订单,把高价订单的付款时间减去低价订单的付款时间在 30天内的高价订单的付款金额和高价订单的退款金额聚合,同时把低价订单的退款时间减去低价订单的付款时间在 30天内的低价订单的付款金额聚合,得到最终的目标表。
select uos.low_price_order_id,uos.low_price_paid_amount,uos.low_price_paid_time,sum(case when uos.low_price_refund_time<=date_add(uos.low_price_paid_time,interval 30 day) then uos.low_price_refund_amount else 0 end) low_price_refund_amount,sum(case when uos.high_price_paid_time<=date_add(uos.low_price_paid_time,interval 30 day) then uos.high_price_paid_amount else 0 end) high_price_paid_amount,sum(case when uos.high_price_paid_time<=date_add(uos.low_price_paid_time,interval 30 day) then uos.high_price_refund_amount else 0 end) high_price_refund_amount
from(select uos.low_price_order_id,uos.low_price_paid_amount,uos.low_price_paid_time,uos.low_price_refund_amount,uos.low_price_refund_time,uos.high_price_order_id,uos.high_price_paid_time,uos.high_price_paid_amount,sum(case when uos.high_price_refund_time<=date_add(uos.high_price_paid_time,interval 30 day) then uos.high_price_refund_amount else 0 end) high_price_refund_amountfrom user_orders uosgroup by uos.low_price_order_id,uos.low_price_paid_amount,uos.low_price_paid_time,uos.low_price_refund_amount,uos.low_price_refund_time,uos.high_price_order_id,uos.high_price_paid_time,uos.high_price_paid_amount
)uos
group by uos.low_price_order_id,uos.low_price_paid_amount,uos.low_price_paid_time
结果如下,符合需求的预期。
可视化
说明:实际业务表单还有很多其他的维度,此处仅抽象出时间维度。
有了目标表,便可以根据目标表按时间维度聚合每月、每周、每天的营收金额、退款率等数据指标。
由于案例数据较少,此处做一个按天查看营收金额和退款率的折线图,看看二者的关系走势。
- 营收金额:低价订单付款金额-低价订单退款金额+高价订单付款金额-高价订单付款金额
- 退款率:高价订单退款金额>0 的数量/总低价订单数,一个低价订单可以视为是一个用户,按用户数计算退款率。
select low_price_paid_time,sum(low_price_paid_amount-low_price_refund_amount+high_price_paid_amount-high_price_refund_amount) 营收金额,sum(if(high_price_refund_amount>0,1,0))/count(*) 退款率
from <引流订单产出表>
group by low_price_paid_time
order by low_price_paid_time
小结
本文介绍了怎么实现以引流的低价订单的为基本维度,按照业务 30天的间隔分别聚合低价订单退款、高价订单金额和高价订单退款金额。
采用了两层表单的设计,一层事实表,一层根据业务不同的口径进行聚合。
处理业务逻辑的时候,根据事实表的结构,采用分段聚合的逻辑先聚合【低价订单+高价订单】,然后再根据【低价订单】进行聚合,最终得到目标表。
目标表开发好之后,在该表的基础上,根据各类业务指标进行聚合并可视化,最终提交给业务方使用。
附录
附上 MaxCompute SQL 处理逻辑,此案例,将 date_add()
语法修改即可。
select uos.low_price_order_id,uos.low_price_paid_amount,uos.low_price_paid_time,sum(case when uos.low_price_refund_time<=date_add(uos.low_price_paid_time,30) then uos.low_price_refund_amount else 0 end) low_price_refund_amount,sum(case when uos.high_price_paid_time<=date_add(uos.low_price_paid_time,30) then uos.high_price_paid_amount else 0 end) high_price_paid_amount,sum(case when uos.high_price_paid_time<=date_add(uos.low_price_paid_time,30) then uos.high_price_refund_amount else 0 end) high_price_refund_amount
from(select uos.low_price_order_id,uos.low_price_paid_amount,uos.low_price_paid_time,uos.low_price_refund_amount,uos.low_price_refund_time,uos.high_price_order_id,uos.high_price_paid_time,uos.high_price_paid_amount,sum(case when uos.high_price_refund_time<=date_add(uos.high_price_paid_time,30) then uos.high_price_refund_amount else 0 end) high_price_refund_amountfrom user_orders uosgroup by uos.low_price_order_id,uos.low_price_paid_amount,uos.low_price_paid_time,uos.low_price_refund_amount,uos.low_price_refund_time,uos.high_price_order_id,uos.high_price_paid_time,uos.high_price_paid_amount
)uos
group by uos.low_price_order_id,uos.low_price_paid_amount,uos.low_price_paid_time