一、准备CSV文件
数据文件下载地址:nba.csv
CSV(Comma-Separated Values,逗号分割值)是一种文件格式,以纯文本形式存储表格数据。
注:CSV有时也称为符号分隔值,因为分隔字符也可不是逗号
二、读CSV文件
2.1 读取CSV文件
用pandas.read_csv()
读取csv文件
程序
import pandas as pddf=pd.read_csv('./Data/nba.csv') # 读取csv文件
print(df)
运行结果
Name Team Number Position Age Height Weight \
0 Avery Bradley Boston Celtics 0.0 PG 25.0 6-2 180.0
1 Jae Crowder Boston Celtics 99.0 SF 25.0 6-6 235.0
2 John Holland Boston Celtics 30.0 SG 27.0 6-5 205.0
3 R.J. Hunter Boston Celtics 28.0 SG 22.0 6-5 185.0
4 Jonas Jerebko Boston Celtics 8.0 PF 29.0 6-10 231.0
.. ... ... ... ... ... ... ...
453 Shelvin Mack Utah Jazz 8.0 PG 26.0 6-3 203.0
454 Raul Neto Utah Jazz 25.0 PG 24.0 6-1 179.0
455 Tibor Pleiss Utah Jazz 21.0 C 26.0 7-3 256.0
456 Jeff Withey Utah Jazz 24.0 C 26.0 7-0 231.0
457 NaN NaN NaN NaN NaN NaN NaN College Salary
0 Texas 7730337.0
1 Marquette 6796117.0
2 Boston University NaN
3 Georgia State 1148640.0
4 NaN 5000000.0
.. ... ...
453 Butler 2433333.0
454 NaN 900000.0
455 NaN 2900000.0
456 Kansas 947276.0
457 NaN NaN [458 rows x 9 columns]
2.2 读取CSV文件前几行
df.head(a)
- 读取文件前a行(默认是前5行)
- 空行返回NaN
程序
import pandas as pddf=pd.read_csv('./Data/nba.csv') # 读取csv文件
print(df.head(10)) # 只输出前10行的数据
运行结果
Name Team Number Position Age Height Weight \
0 Avery Bradley Boston Celtics 0.0 PG 25.0 6-2 180.0
1 Jae Crowder Boston Celtics 99.0 SF 25.0 6-6 235.0
2 John Holland Boston Celtics 30.0 SG 27.0 6-5 205.0
3 R.J. Hunter Boston Celtics 28.0 SG 22.0 6-5 185.0
4 Jonas Jerebko Boston Celtics 8.0 PF 29.0 6-10 231.0
5 Amir Johnson Boston Celtics 90.0 PF 29.0 6-9 240.0
6 Jordan Mickey Boston Celtics 55.0 PF 21.0 6-8 235.0
7 Kelly Olynyk Boston Celtics 41.0 C 25.0 7-0 238.0
8 Terry Rozier Boston Celtics 12.0 PG 22.0 6-2 190.0
9 Marcus Smart Boston Celtics 36.0 PG 22.0 6-4 220.0 College Salary
0 Texas 7730337.0
1 Marquette 6796117.0
2 Boston University NaN
3 Georgia State 1148640.0
4 NaN 5000000.0
5 NaN 12000000.0
6 LSU 1170960.0
7 Gonzaga 2165160.0
8 Louisville 1824360.0
9 Oklahoma State 3431040.0
2.3 读取CSV文件后几行
df.tail(a)
- 读取文件后a行(默认是后5行)
- 空行返回NaN
程序
import pandas as pddf=pd.read_csv('./Data/nba.csv') # 读取csv文件
print(df.tail(10)) # 只输出后10行的数据
运行结果
Name Team Number Position Age Height Weight \
448 Gordon Hayward Utah Jazz 20.0 SF 26.0 6-8 226.0
449 Rodney Hood Utah Jazz 5.0 SG 23.0 6-8 206.0
450 Joe Ingles Utah Jazz 2.0 SF 28.0 6-8 226.0
451 Chris Johnson Utah Jazz 23.0 SF 26.0 6-6 206.0
452 Trey Lyles Utah Jazz 41.0 PF 20.0 6-10 234.0
453 Shelvin Mack Utah Jazz 8.0 PG 26.0 6-3 203.0
454 Raul Neto Utah Jazz 25.0 PG 24.0 6-1 179.0
455 Tibor Pleiss Utah Jazz 21.0 C 26.0 7-3 256.0
456 Jeff Withey Utah Jazz 24.0 C 26.0 7-0 231.0
457 NaN NaN NaN NaN NaN NaN NaN College Salary
448 Butler 15409570.0
449 Duke 1348440.0
450 NaN 2050000.0
451 Dayton 981348.0
452 Kentucky 2239800.0
453 Butler 2433333.0
454 NaN 900000.0
455 NaN 2900000.0
456 Kansas 947276.0
457 NaN NaN
2.4 查看CSV文件相关信息
df.info()
- 返回文件的基本信息
程序
import pandas as pddf=pd.read_csv('./Data/nba.csv') # 读取csv文件
print(df.info()) # 输出文件基本信息
运行结果
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 458 entries, 0 to 457
Data columns (total 9 columns):# Column Non-Null Count Dtype
--- ------ -------------- ----- 0 Name 457 non-null object 1 Team 457 non-null object 2 Number 457 non-null float643 Position 457 non-null object 4 Age 457 non-null float645 Height 457 non-null object 6 Weight 457 non-null float647 College 373 non-null object 8 Salary 446 non-null float64
dtypes: float64(4), object(5)
memory usage: 32.3+ KB
None
三、写CSV文件
暂时略,用到的时候再补充
参考
pandas IO tools (text, CSV, HDF5, …)