文章目录
- index区间取行
- 列值区间条件取行
- (1)列值区间基本表达方式
- (2)多条件组合表达方式
- (3)函数条件表达方式
- datafame接受的几个过滤函数
- (1)isin函数:
- (2) query函数:
- (3) contains函数:
- 错误条件格式:
- 示例1:
- 示例2:
关注根据某列series的值区间,取行问题。
根据行的index区间位置,必须先知道index区间,比较基础
代码准备:
环境平台:Python 3.7.1 -IDLE Shell
>>>import pandas as pd
>>> df = pd.DataFrame({'Name': ['Tom', 'Jim', 'Lily'], 'Age': [20, 18, 22], 'Gender': ['Male', 'Male', 'Female']})
注:该例子数据形式来自:https://www.python100.com/html/116332.html
index区间取行
示例1:提取索引名=‘1’的那一行,返回一个行Series
>>> row = df.loc[1]#按索引名提取,当使用自动生成的索引时,索引名与索引号相同
>>> row
Name Jim
Age 18
Gender Male
Name: 1, dtype: object
>>> type(row)
<class 'pandas.core.series.Series'>
注:
df.iloc[:] #按索引(号)提取
示例2:
row = df.loc[0:1]
>>> rowName Age Gender
0 Tom 20 Male
1 Jim 18 Male
>>> row = df.iloc[0:1]
>>> rowName Age Gender
0 Tom 20 Male
>>> row = df.loc[0:0]
>>> rowName Age Gender
0 Tom 20 Male
>>> type(row)
<class 'pandas.core.frame.DataFrame'>
注意:两种提取的区间有区别:按索引(号)提取的区间为:[0,1)
>>> row = df.iloc[0:0]
>>> row
Empty DataFrame
Columns: [Name, Age, Gender]
Index: []
列值区间条件取行
>>> row = df.loc[df['Age'] > 20].iloc[0]['Name']
>>> row
'Lily'
>>>
上语句的含义是:需要从dataframe:df.loc[df[‘Age’] > 20]中提取索引为0的行Series的‘name’值
(1)列值区间基本表达方式
示例1:
>>> row = df.loc[df['Age'] > 18]
>>> rowName Age Gender
0 Tom 20 Male
2 Lily 22 Female
>>>
注:超过区间,不会产生错误,返回:
>>> row = df.loc[df['Age'] > 23]
>>> row
Empty DataFrame
Columns: [Name, Age, Gender]
Index: []
(2)多条件组合表达方式
示例2:
>>> row = df.loc[(df['Age'] >= 18)&(df['Name'] == 'Lily')]
>>> rowName Age Gender
2 Lily 22 Female
>>>
如果条件为False则返回的dataframe为Empty:
>>> row = df.loc[(df['Age'] >= 18)&(df['Name'] == 'tongzhi')]#'tongzhi'不存在原dataframe
>>> row
Empty DataFrame
Columns: [Name, Age, Gender]
Index: []
>>>
当然也可以:用’|'关系操作符:
>>> row = df.loc[(df['Age'] >= 18)|(df['Name'] == 'Jim')]
>>> rowName Age Gender
0 Tom 20 Male
1 Jim 18 Male
2 Lily 22 Female
>>>
注:还可以关系:~ 非
(3)函数条件表达方式
可以使用lambda或自定义函数(返回bool)选择符合返回条件的行,如:
>>> x='Jim'
>>> row = df.loc[lambda x:x['Name'] == 'Jim']
>>> rowName Age Gender
1 Jim 18 Male
>>>
datafame接受的几个过滤函数
(1)isin函数:
df[df[“column_name”].isin(li)] (# li = [20, 25, 27] 或 li = np.arange(20, 30))
根据从isin函数传入的列表(li),筛选出与列表中包含的数值或字符串相同的数据记录, 用法有点类似sql中的"in"
(2) query函数:
df.query(“(column_name1 == ‘str1’) & (column_name2 == ‘str2’)”)
根据query中引入的不同字段(str1,str2等)和条件,筛选出同时能满足这些要求的数据记录
(3) contains函数:
df[df[“column_name”].str.contains(“str”)]
筛选出所有含有(str)的数据记录, 用法类似于sql中的"contains"
以上参考了:链接:https://blog.csdn.net/weixin_45914452/article/details/120585861
错误条件格式:
示例1:
>>> row = df.loc[(18<=df['Age'] <= 22)]
Traceback (most recent call last):File "<pyshell#56>", line 1, in <module>row = df.loc[(18<=df['Age'] <= 22)]File "D:\Program Files\Python371\lib\site-packages\pandas\core\generic.py", line 1538, in __nonzero__f"The truth value of a {type(self).__name__} is ambiguous. "
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>>
示例2:
>>> row = df.loc[df['Age'] >= 18 & df['Age'] <= 22]
Traceback (most recent call last):File "<pyshell#38>", line 1, in <module>row = df.loc[df['Age'] >= 18 & df['Age'] <= 22]File "D:\Program Files\Python371\lib\site-packages\pandas\core\generic.py", line 1538, in __nonzero__f"The truth value of a {type(self).__name__} is ambiguous. "
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().