用于数据分析的Python

用于数据分析的Python – Pandas

大熊猫 (Pandas)

Pandas is an open-source library built on top of NumPy
Pandas是建立在NumPy之上的开源库
It allows for fast analysis and data cleaning and preparation
它允许快速分析以及数据清理和准备
It excels in performance and productivity
它在性能和生产力方面都非常出色
It also has built-in visualization features
它还具有内置的可视化功能
It can work with data from a wide variety of sources
它可以处理来自各种来源的数据

如何安装熊猫？ (How to install Pandas?)

Using PIP

使用画中画

(venv) -bash-4.2$ pip install pandas
Requirement already satisfied: pandas in ./venv/lib/python3.6/site-packages (0.25.1)
Requirement already satisfied: python-dateutil>=2.6.1 in ./venv/lib/python3.6/site-packages (from pandas) (2.8.0)
Requirement already satisfied: pytz>=2017.2 in ./venv/lib/python3.6/site-packages (from pandas) (2019.2)
Requirement already satisfied: numpy>=1.13.3 in ./venv/lib/python3.6/site-packages (from pandas) (1.17.2)
Requirement already satisfied: six>=1.5 in ./venv/lib/python3.6/site-packages (from python-dateutil>=2.6.1->pandas) (1.12.0)
venv) -bash-4.2$

Series

系列

One-dimensional ndarray with axis labels, including time series. It is capable of holding data of any type. The axis labels are collectively known as an index. Series is very similar to a NumPy array, built on NumPy array object. However, the difference being a series can be indexed by labels.

具有轴标签的一维ndarray，包括时间序列 。它能够保存任何类型的数据。轴标签统称为索引。系列与建立在NumPy数组对象上的NumPy数组非常相似。但是，区别在于可以通过标签对系列进行索引。

Syntax:

句法：

class pandas.Series(
data=None, 
index=None, dtype=None, 
name=None, 
copy=False, 
fastpath=False
)

Below snippets shows examples of creating a series,

以下代码片段显示了创建系列的示例，

import numpy as np
import pandas as pd
labels = ['a','e','i','o'] #python list
data = [1,2,3,4] #python list
arr = np.array(data) #NumPy array
d = {'a':1,'b':2,'c':3} #python dict
# creating a series object with default index
print(pd.Series(data = data))
# creating a series object with labels as index
print(pd.Series(data = data, index = labels))
# creating a series with NumPy array
print(pd.Series(arr,index = labels))
# creating a series with dictionary, 
# here the key becomes the index
print(pd.Series(d))
# Series can also hold built-in func
print(pd.Series(data = [sum, print, len]))

Output

输出量

0    1
1    2
2    3
3    4
dtype: int64
a    1
e    2
i    3
o    4
dtype: int64
a    1
e    2
i    3
o    4
dtype: int64
a    1
b    2
c    3
dtype: int64
0       <built-in function sum>
1       <built-in function print>
2       <built-in function len>
dtype: object

系列操作 (Operations on Series)

Create two series object

创建两个系列对象

import pandas as pd
ser1 = pd.Series([1,2,3,4],['Delhi','Bangalore','Mysore', 'Pune'])
print(ser1)
ser2 = pd.Series([1,2,5,4],['Delhi','Bangalore','Vizag','Pune'])
print(ser2)

Output

输出量

Delhi        1
Bangalore    2
Mysore       3
Pune         4
dtype: int64
Delhi        1
Bangalore    2
Vizag        5
Pune         4
dtype: int64

To retrieve the information from the series, is similar to the python dictionary, pass on the index-label of the given data type. In the above example, the index-label is of type String.

要从系列中检索信息，类似于python字典，传递给定数据类型的index-label。在上面的示例中，索引标签的类型为String。

print(ser1['Delhi'])
# Output: 1

Now let's trying adding the two series,

现在让我们尝试添加两个系列，

print(ser1+ser2)
'''
Output:
Bangalore    4.0
Delhi        2.0
Mysore       NaN
Pune         8.0
Vizag        NaN
dtype: float64
'''

The pandas, adds the values of the index-labels. In case the match is not found, it will be put a NaN (null value). When the operations are performed on series or any NumPy/Pandas based object, the integers will be converted to float.

pandas ，添加索引标签的值。如果找不到匹配项，则将其放入NaN(空值)。当对序列或任何基于NumPy / Pandas的对象执行操作时，整数将转换为float。