Pandas入门3（dtype+fillna+replace+rename+concat+join）

文章目录

- 5. dtype 数据类型
- 6. Missing data 缺失值
- - 6.1 查找缺失值 pd.isnull()，pd.notnull()
  - 6.2 填补缺失值 fillna()，replace()
- 7. Renaming and Combining 重命名、合并
- - 7.1 Renaming 重命名
  - 7.2 Combining 合并数据

learn from https://www.kaggle.com/learn/pandas

上一篇：Pandas入门2（DataFunctions+Maps+groupby+sort_values）

5. dtype 数据类型

print(wine_rev.price.dtype)，float64
wine_rev.dtypes，整张表，需要加复数s！！！

country                   object
description               object
designation               object
points                     int64
price                    float64
province                  object
region_1                  object
region_2                  object
taster_name               object
taster_twitter_handle     object
title                     object
variety                   object
winery                    object
critic                    object
test_id                    int32
dtype: object

字符串的数据类型为object
astype()，可以进行类型转换
wine_rev.points.astype('float64')

0         87.0
1         87.0
2         87.0
3         87.0
4         87.0... 
129966    90.0
129967    90.0
129968    90.0
129969    90.0
129970    90.0
Name: points, Length: 129971, dtype: float64

wine_rev.index.dtype，索引的类型是dtype('int64')

6. Missing data 缺失值

6.1 查找缺失值 pd.isnull()，pd.notnull()

缺少值的条目将被赋予值NaN，是Not a Number的缩写。这些NaN值始终为float64 dtype。
要选择NaN条目，可以使用pd.isnull()，pd.notnull()

wine_rev[pd.isnull(wine_rev.country)]

6.2 填补缺失值 fillna()，replace()

wine_rev.region_2.fillna('Unknown')，原始数据不改变
还可以把缺失值填成之前出现的第一个非空值，称为回填策略
wine_rev.taster_twitter_handle.replace("@kerinokeefe", "@kerino")，把前者替换成后者

7. Renaming and Combining 重命名、合并

7.1 Renaming 重命名

把名字改成我们喜欢的，更合适的，rename()，可以把索引名、列名更改
wine_rev.rename(columns={'points':'score'})
更改index，wine_rev.rename(index={0:'michael',1:'ming'})，index={字典}
rename_axis()，可以更改行索引、列索引名称
wine_rev.rename_axis("酒",axis='rows').rename_axis('特征',axis='columns')

7.2 Combining 合并数据

concat()，join() 和 merge()

canadian_youtube = pd.read_csv("../input/youtube-new/CAvideos.csv")
british_youtube = pd.read_csv("../input/youtube-new/GBvideos.csv")pd.concat([canadian_youtube, british_youtube])

left = canadian_youtube.set_index(['title', 'trending_date'])
right = british_youtube.set_index(['title', 'trending_date'])left.join(right, lsuffix='_CAN', rsuffix='_UK')