一、每日一题
返回结果示例如下:
示例 1:
输入: Ads 表: +-------+---------+---------+ | ad_id | user_id | action | +-------+---------+---------+ | 1 | 1 | Clicked | | 2 | 2 | Clicked | | 3 | 3 | Viewed | | 5 | 5 | Ignored | | 1 | 7 | Ignored | | 2 | 7 | Viewed | | 3 | 5 | Clicked | | 1 | 4 | Viewed | | 2 | 11 | Viewed | | 1 | 2 | Clicked | +-------+---------+---------+ 输出: +-------+-------+ | ad_id | ctr | +-------+-------+ | 1 | 66.67 | | 3 | 50.00 | | 2 | 33.33 | | 5 | 0.00 | +-------+-------+ 解释: 对于 ad_id = 1, ctr = (2/(2+1)) * 100 = 66.67 对于 ad_id = 2, ctr = (1/(1+2)) * 100 = 33.33 对于 ad_id = 3, ctr = (1/(1+1)) * 100 = 50.00 对于 ad_id = 5, ctr = 0.00, 注意 ad_id = 5 没有被点击 (Clicked) 或查看 (Viewed) 过 注意我们不关心 action 为 Ingnored 的广告
解答:
import pandas as pddef ads_performance(ads_df: pd.DataFrame) -> pd.DataFrame:
# 过滤掉 Ignored 行filtered_df = ads_df[ads_df['action'] != 'Ignored']# 计算每个 ad_id 的点击次数和查看次数summary = filtered_df.groupby(['ad_id', 'action']).size().unstack(fill_value=0)# 确保有 Clicked 和 Viewed 列,如果不存在则创建并填充为0summary['Clicked'] = summary.get('Clicked', 0)summary['Viewed'] = summary.get('Viewed', 0)
# 计算CTRsummary['CTR'] = (summary['Clicked'] / (summary['Clicked'] + summary['Viewed'])).fillna(0) * 100# 处理没有被点击或查看的广告,赋值为0full_ad_ids = ads_df['ad_id'].unique()ctr_df = summary.reindex(full_ad_ids, fill_value = 0).reset_index()ctr_df['CTR'] = ctr_df['CTR'].fillna(0).round(2)# 排序,按CTR降序和ad_id升序排序ctr_df = ctr_df.sort_values(by = ['CTR', 'ad_id'], ascending = [False, True])[['ad_id', 'CTR']]return ctr_df# 测试代码
data = {'ad_id': [1, 2, 3, 5, 1, 2, 3, 1, 2, 1],'user_id': [1, 2, 3, 5, 7, 7, 5, 4, 11, 2],'action': ['Clicked', 'Clicked', 'Viewed', 'Ignored', 'Ignored', 'Viewed', 'Clicked', 'Viewed', 'Viewed', 'Clicked']
}
ads_df = pd.DataFrame(data)result = ads_performance(ads_df)
print(result)
二、总结
注意分母为0的情况时如何处理。