Pandas iteration遍历
遍历是众多编程语言中必备的一种操作,比如 Python 语言通过 for 循环来遍历列表结构。那么 Pandas 是如何遍历 Series 和 DataFrame 结构呢?我们应该明确,它们的数据结构类型不同的,遍历的方法必然会存在差异。对于 Series 而言,您可以把它当做一维数组进行遍历操作;而像 DataFrame 这种二维数据表结构,则类似于遍历 Python 字典。
在 Pandas 中同样也是使用 for 循环进行遍历。通过for
遍历后,Series 可直接获取相应的 value,而 DataFrame 则会获取列标签。示例如下:
import pandas as pd
import numpy as np
N=20
df = pd.DataFrame({'A': pd.date_range(start='2016-01-01',periods=N,freq='D'),'x': np.linspace(0,stop=N-1,num=N),'y': np.random.rand(N),'C': np.random.choice(['Low','Medium','High'],N).tolist(),'D': np.random.normal(100, 10, size=(N)).tolist()})
print(df)
print("迭代所有的列标签:")
for col in df:print (col)
输出结果:
A x y C D
0 2016-01-01 0.0 0.306298 High 99.538774
1 2016-01-02 1.0 0.350768 High 111.734390
2 2016-01-03 2.0 0.912953 Low 90.404414
3 2016-01-04 3.0 0.553158 High 92.537805
4 2016-01-05 4.0 0.045641 Medium 92.612460
5 2016-01-06 5.0 0.289502 High 94.824675
6 2016-01-07 6.0 0.247479 High 99.026844
7 2016-01-08 7.0 0.648031 High 126.798087
8 2016-01-09 8.0 0.675396 Low 106.342780
9 2016-01-10 9.0 0.599190 Medium 92.860987
10 2016-01-11 10.0 0.394856 Low 98.485523
11 2016-01-12 11.0 0.300833 Medium 87.875087
12 2016-01-13 12.0 0.018943 Low 94.117690
13 2016-01-14 13.0 0.451572 Low 119.475830
14 2016-01-15 14.0 0.972835 High 91.034207
15 2016-01-16 15.0 0.645414 High 89.636694
16 2016-01-17 16.0 0.467082 High 99.775743
17 2016-01-18 17.0 0.108793 Low 78.775024
18 2016-01-19 18.0 0.592192 High 107.170954
19 2016-01-20 19.0 0.568169 High 99.094213
迭代所有的列标签:
A
x
y
C
D
内置迭代方法
如果想要遍历 DataFrame 的每一行,我们下列函数:
- iterrows():以 (row_index,row) 的形式遍历行;
- itertuples():使用已命名元组的方式对行遍历。
下面对上述函数做简单的介绍:
1) iterrows()
以键值对的形式遍历 DataFrame 对象,以列标签为键,以对应列的元素为值。
import pandas as pd
import numpy as np
N=20
df = pd.DataFrame({'A': pd.date_range(start='2016-01-01',periods=N,freq='D'),'x': np.linspace(0,stop=N-1,num=N),'y': np.random.rand(N),'C': np.random.choice(['Low','Medium','High'],N).tolist(),'D': np.random.normal(100, 10, size=(N)).tolist()})for key,value in df.iterrows():print (key,value)
输出结果:
0 A 2016-01-01 00:00:00
x 0.0
y 0.376904
C High
D 85.622403
Name: 0, dtype: object
1 A 2016-01-02 00:00:00
x 1.0
y 0.740229
C Medium
D 78.572574
Name: 1, dtype: object
2 A 2016-01-03 00:00:00
x 2.0
y 0.672089
C Medium
D 101.784087
Name: 2, dtype: object
......
Name: 19, dtype: object
2) itertuples
itertuples() 同样将返回一个迭代器,该方法会把 DataFrame 的每一行生成一个元组,示例如下:
import pandas as pd
import numpy as np
N=20
df = pd.DataFrame({'A': pd.date_range(start='2016-01-01',periods=N,freq='D'),'x': np.linspace(0,stop=N-1,num=N),'y': np.random.rand(N),'C': np.random.choice(['Low','Medium','High'],N).tolist(),'D': np.random.normal(100, 10, size=(N)).tolist()})for row in df.itertuples():print (row)
输出结果:
Pandas(Index=0, A=Timestamp('2016-01-01 00:00:00'), x=0.0, y=0.0073833717186708725, C='Medium', D=102.21685856034675)
Pandas(Index=1, A=Timestamp('2016-01-02 00:00:00'), x=1.0, y=0.7570282079756047, C='Medium', D=77.88547775291684)
Pandas(Index=2, A=Timestamp('2016-01-03 00:00:00'), x=2.0, y=0.039159500841185246, C='Medium', D=90.60034318698546)
Pandas(Index=3, A=Timestamp('2016-01-04 00:00:00'), x=3.0, y=0.5777131686110479, C='Medium', D=108.45249228376123)
Pandas(Index=4, A=Timestamp('2016-01-05 00:00:00'), x=4.0, y=0.4726895679114832, C='High', D=102.3053880413406)
Pandas(Index=5, A=Timestamp('2016-01-06 00:00:00'), x=5.0, y=0.9181876349067116, C='High', D=88.77667424669386)
Pandas(Index=6, A=Timestamp('2016-01-07 00:00:00'), x=6.0, y=0.352008513872231, C='Low', D=94.1640236552118)
Pandas(Index=7, A=Timestamp('2016-01-08 00:00:00'), x=7.0, y=0.5722692889700786, C='Medium', D=91.32266564519188)
Pandas(Index=8, A=Timestamp('2016-01-09 00:00:00'), x=8.0, y=0.18340633936165507, C='Medium', D=91.40118820334366)
Pandas(Index=9, A=Timestamp('2016-01-10 00:00:00'), x=9.0, y=0.5822548446901658, C='Medium', D=105.26907848666296)
Pandas(Index=10, A=Timestamp('2016-01-11 00:00:00'), x=10.0, y=0.40705596480000217, C='High', D=85.52555287827161)
Pandas(Index=11, A=Timestamp('2016-01-12 00:00:00'), x=11.0, y=0.9525667200400463, C='High', D=107.35261261096153)
Pandas(Index=12, A=Timestamp('2016-01-13 00:00:00'), x=12.0, y=0.44425664486730154, C='Medium', D=92.55767916353153)
Pandas(Index=13, A=Timestamp('2016-01-14 00:00:00'), x=13.0, y=0.5468369154349298, C='High', D=87.74208234902464)
Pandas(Index=14, A=Timestamp('2016-01-15 00:00:00'), x=14.0, y=0.4727283165059927, C='Low', D=107.5236125991258)
Pandas(Index=15, A=Timestamp('2016-01-16 00:00:00'), x=15.0, y=0.990707163043359, C='Low', D=95.76090795914205)
Pandas(Index=16, A=Timestamp('2016-01-17 00:00:00'), x=16.0, y=0.6243139269960055, C='Low', D=101.45573754665573)
Pandas(Index=17, A=Timestamp('2016-01-18 00:00:00'), x=17.0, y=0.6146066882888525, C='High', D=99.43866726961795)
Pandas(Index=18, A=Timestamp('2016-01-19 00:00:00'), x=18.0, y=0.6001033142743434, C='Low', D=117.15405644081103)
Pandas(Index=19, A=Timestamp('2016-01-20 00:00:00'), x=19.0, y=0.06108299134959061, C='Medium', D=102.41567398727766)