官网地址:https://pandas.pydata.org/docs/reference/groupby.html
pandas中对数据进行分组操作的方法,官方有很详细的教程。下面的案例是真实遇到的问题,看一看用pandas是如何解决的。
构造数据
import pandas as pdimport numpy as npdf = pd.DataFrame(data={ "boss":["A"]*3+["B"]*3+["C"]*4, "owner":["A1","A1","A2","B1","B2","B2","C1","C1","C2","C2"], "month":[1,2,1,1,1,2,1,2,1,2], "fk_money":[10,20,30,40,50,60,70,80,90,100],})
数据展示
boss | owner | month | fk_money | |
0 | A | A1 | 1 | 10 |
1 | A | A1 | 2 | 20 |
2 | A | A2 | 1 | 30 |
3 | B | B1 | 1 | 40 |
4 | B | B2 | 1 | 50 |
5 | B | B2 | 2 | 60 |
6 | C | C1 | 1 | 70 |
7 | C | C1 | 2 | 80 |
8 | C | C2 | 1 | 90 |
9 | C | C2 | 2 | 100 |
解释:比如第一条数据,老板A手下的业务员A1,在第1个月的放款金额为10万。
问题一、老板手下的业务员在哪个月的放款金额最多?
解决思路:
- 按照owner分组,降序排列,取第一个数据
result1_df = df.sort_values(by="fk_money",ascending=False).groupby(by="owner").head(1)result1_df
boss | owner | month | fk_money | |
9 | C | C2 | 2 | 100 |
7 | C | C1 | 2 | 80 |
5 | B | B2 | 2 | 60 |
3 | B | B1 | 1 | 40 |
2 | A | A2 | 1 | 30 |
1 | A | A1 | 2 | 20 |
解释:老板C手下的业务员C2在第2个月的放款金额最大为100万。
拓展:如何取第二大的数据?
GroupBy.nth(),取每一组第n行的数据,n从0开始,0代表第一行。
- 没有第n行的时候,不取。
result1_df = df.sort_values(by="fk_money",ascending=False).groupby(by="owner",as_index=False).nth(1)result1_df
owner | boss | month | fk_money |
A1 | A | 1 | 10 |
B1 | B | 1 | 50 |
C1 | C | 1 | 70 |
C2 | C | 1 | 90 |
解决思路:
计算出每个业务员总的放款金额owner_total_fk_money
将df与计算好的owner_total_fk_money合并
fk_money除以owner_total_fk_money得到需要的数据
### 代码实现:owner_total_fk_money = df.groupby(by="owner",as_index=False).agg({"fk_money":"sum"})
result1_df = pd.merge(df,owner_total_fk_money,on="owner",how="left",suffixes=("","_total"))
result1_df["rate"] = (result1_df["fk_money"]/result1_df["fk_money_total"]).map(lambda x:"{:.2%}".format(x))result1_df
boss | owner | month | fk_money | fk_money_total | rate | |
0 | A | A1 | 1 | 10 | 30 | 33.33% |
1 | A | A1 | 2 | 20 | 30 | 66.67% |
2 | A | A2 | 1 | 30 | 30 | 100.00% |
3 | B | B1 | 1 | 40 | 40 | 100.00% |
4 | B | B2 | 1 | 50 | 110 | 45.45% |
5 | B | B2 | 2 | 60 | 110 | 54.55% |
6 | C | C1 | 1 | 70 | 150 | 46.67% |
7 | C | C1 | 2 | 80 | 150 | 53.33% |
8 | C | C2 | 1 | 90 | 190 | 47.37% |
9 | C | C2 | 2 | 100 | 190 | 52.63% |
问题三、每个老板手下业务员放款占比?
解决思路:
需要知道每个老板总的放款金额,boss_df
需要知道每个业务员的放款金额,owner_df
按照boss字段合并boss_df和owner_df
业务员的放款金额除以每个老板总的放款金额
# 计算每一个boss的总fk_moneyboss_df = df.groupby(by="boss",as_index=False).agg({"fk_money":"sum"})
# 计算每一个owner的总fk_moneyowner_df = df.groupby(by=["boss","owner"],as_index=False).agg({"fk_money":"sum"})
# 合并owner_df和boss_dfresult2_df = pd.merge(owner_df,boss_df,on="boss",how="left",suffixes=("_owner","_boss"))
result2_df["占比"] = (result_df["fk_money_owner"]/result_df["fk_money_boss"]).map(lambda x:"{:.2%}".format(x)result2_df
boss | owner | fk_money_owner | fk_money_boss | reate | |
0 | A | A1 | 30 | 60 | 50.00% |
1 | A | A2 | 30 | 60 | 50.00% |
2 | B | B1 | 40 | 150 | 26.67% |
3 | B | B2 | 110 | 150 | 73.33% |
4 | C | C1 | 150 | 340 | 44.12% |
5 | C | C2 | 190 | 340 | 55.88% |
解释:老板A手下的A1占总放款金额(60万)比例为50%。
----END----