【数据分析面试】3.编写数据选取函数（Python）

在这里插入图片描述

题目

给定了一个名为 students_df 的学生数据表格

name	age	favorite_color	grade
Tim Voss	19	red	91
Nicole Johnson	20	yellow	95
Elsa Williams	21	green	82
John James	20	blue	75
Catherine Jones	23	green	93

编写一个名为 grades_colors 的函数，以选择仅当学生喜欢的颜色是绿色或红色且他们的成绩高于90时的行。

示例:

输入:

import pandas as pdstudents = {"name" : ["Tim Voss", "Nicole Johnson", "Elsa Williams", "John James", "Catherine Jones"], "age" : [19, 20, 21, 20, 23], "favorite_color" : ["red", "yellow", "green", "blue", "green"], "grade" : [91, 95, 82, 75, 93]}students_df = pd.DataFrame(students)

输出:

def grades_colors(students_df) ->

name	age	favorite_color	grade
Tim Voss	19	red	91
Catherine Jones	23	green	93

答案

答案代码

首先使用 isin() 方法选择喜欢的颜色为绿色或红色的行，然后使用条件筛选出成绩大于90的行，最后返回满足条件的结果。

def grades_colors(students_df):# 选择喜欢的颜色为绿色或红色，且成绩大于90的行result_df = students_df[(students_df['favorite_color'].isin(['green', 'red'])) & (students_df['grade'] > 90)]return result_df

Python代码 —— 选取特定值

# 多条件选取
new_df = df[(df['country'].isin(['Australia', 'New Zealand'])) &(df['points'] >= 95)
]# 使用 loc
new_df = df.loc[(df['country'].isin(['Australia', 'New Zealand'])) &(df['points'] >= 95)
]# 同一列不同值：年份为87、88、99年
df[(df['Year'] == 1987) |(df['Year'] == 1988) |(df['Year'] == 1999)]# 在两个连续值之间：1987-1988年之间
df[df['Year'].isin([1987, 1988])]# 根据特定条件，选择指定列的内容：87年,Size为High的数据中，I、M、Y、O四列
select_columns = ['I', 'M', 'Y', 'O']
df[(df['Year'] == 1987) &(df['Size'] == 'High')][select_columns]# 使用 loc,选择指定列的内容
df.loc[(df['Year'] == 1987) &(df['Outlet_Size'] == 'High'), select_columns]