数据操作
- 1. 相关知识点
- 1.12 分组与连表
- 1.13 排名
- 2. 题目
- 2.10 第N高的薪水
- 2.11 第二高的薪水
- 2.12 部门工资最高的员工
- 2.13 分数排名
- 2.14 删除重复的电子邮箱
- 2.15 每个产品在不同商店的价格
1. 相关知识点
1.12 分组与连表
- 分组
max_salary=employee.groupby('departmentId')['salary'].max().reset_index()
- 连表
data=pd.merge(employee,department,left_on='departmentId',right_on='id')
1.13 排名
dense
相同值的项将获得连续排名ascending
指定排名的顺序,默认值为 True,升序scores['rank']=scores['score'].rank(method = 'dense',ascending = False)
2. 题目
2.10 第N高的薪水
import pandas as pddef nth_highest_salary(employee: pd.DataFrame, N: int) -> pd.DataFrame:if len(employee)<N:return pd.DataFrame({'getNthHighestSalary(2)':[None]})employee.sort_values('salary',ascending=False,inplace=True)employee=employee.rename(columns={'salary':'getNthHighestSalary(2)'})return employee[['getNthHighestSalary(2)']].head(N).tail(1)
2.11 第二高的薪水
import pandas as pddef second_highest_salary(employee: pd.DataFrame) -> pd.DataFrame:if len(employee)<2:return pd.DataFrame({'SecondHighestSalary':[None]})employee.sort_values('salary',ascending=False,inplace=True)employee=employee.rename(columns={'salary':'SecondHighestSalary'})return employee[['SecondHighestSalary']].head(2).tail(1)
2.12 部门工资最高的员工
import pandas as pddef department_highest_salary(employee: pd.DataFrame, department: pd.DataFrame) -> pd.DataFrame:max_salary=employee.groupby('departmentId')['salary'].max().reset_index()max_list=max_salary['salary'].to_list()employee=employee.query(f'`salary` in {max_list}')data=pd.merge(employee,department,left_on='departmentId',right_on='id')data=data.rename(columns={'name_y':'Department','name_x':'Employee'})return data[['Department','Employee','salary']]
2.13 分数排名
import pandas as pddef order_scores(scores: pd.DataFrame) -> pd.DataFrame:scores['rank']=scores['score'].rank(method = 'dense',ascending = False)return scores.sort_values('rank')[['score','rank']]
2.14 删除重复的电子邮箱
import pandas as pddef delete_duplicate_emails(person: pd.DataFrame) -> None:person.sort_values('id',inplace=True)person.drop_duplicates(subset=['email'],keep='first',inplace=True)
2.15 每个产品在不同商店的价格
import pandas as pddef rearrange_products_table(products: pd.DataFrame) -> pd.DataFrame:data=pd.melt(products,id_vars='product_id',var_name='store',value_name='price')# axis=0代表行data=data.dropna(subset=['price'],how='any', axis=0,inplace = False)return data