大熊猫 (Pandas)
Pandas is an open-source library built on top of NumPy
Pandas是建立在NumPy之上的开源库
It allows for fast analysis and data cleaning and preparation
它允许快速分析以及数据清理和准备
It excels in performance and productivity
它在性能和生产力方面都非常出色
It also has built-in visualization features
它还具有内置的可视化功能
It can work with data from a wide variety of sources
它可以处理来自各种来源的数据
如何安装熊猫? (How to install Pandas?)
Using PIP
使用画中画
(venv) -bash-4.2$ pip install pandas
Requirement already satisfied: pandas in ./venv/lib/python3.6/site-packages (0.25.1)
Requirement already satisfied: python-dateutil>=2.6.1 in ./venv/lib/python3.6/site-packages (from pandas) (2.8.0)
Requirement already satisfied: pytz>=2017.2 in ./venv/lib/python3.6/site-packages (from pandas) (2019.2)
Requirement already satisfied: numpy>=1.13.3 in ./venv/lib/python3.6/site-packages (from pandas) (1.17.2)
Requirement already satisfied: six>=1.5 in ./venv/lib/python3.6/site-packages (from python-dateutil>=2.6.1->pandas) (1.12.0)
venv) -bash-4.2$
Series
系列
One-dimensional ndarray with axis labels, including time series. It is capable of holding data of any type. The axis labels are collectively known as an index. Series is very similar to a NumPy array, built on NumPy array object. However, the difference being a series can be indexed by labels.
具有轴标签的一维ndarray,包括时间序列 。 它能够保存任何类型的数据。 轴标签统称为索引。 系列与建立在NumPy数组对象上的NumPy数组非常相似。 但是,区别在于可以通过标签对系列进行索引。
Syntax:
句法:
class pandas.Series(
data=None,
index=None, dtype=None,
name=None,
copy=False,
fastpath=False
)
Below snippets shows examples of creating a series,
以下代码片段显示了创建系列的示例,
import numpy as np
import pandas as pd
labels = ['a','e','i','o'] #python list
data = [1,2,3,4] #python list
arr = np.array(data) #NumPy array
d = {'a':1,'b':2,'c':3} #python dict
# creating a series object with default index
print(pd.Series(data = data))
# creating a series object with labels as index
print(pd.Series(data = data, index = labels))
# creating a series with NumPy array
print(pd.Series(arr,index = labels))
# creating a series with dictionary,
# here the key becomes the index
print(pd.Series(d))
# Series can also hold built-in func
print(pd.Series(data = [sum, print, len]))
Output
输出量
0 1
1 2
2 3
3 4
dtype: int64
a 1
e 2
i 3
o 4
dtype: int64
a 1
e 2
i 3
o 4
dtype: int64
a 1
b 2
c 3
dtype: int64
0 <built-in function sum>
1 <built-in function print>
2 <built-in function len>
dtype: object
系列操作 (Operations on Series)
Create two series object
创建两个系列对象
import pandas as pd
ser1 = pd.Series([1,2,3,4],['Delhi','Bangalore','Mysore', 'Pune'])
print(ser1)
ser2 = pd.Series([1,2,5,4],['Delhi','Bangalore','Vizag','Pune'])
print(ser2)
Output
输出量
Delhi 1
Bangalore 2
Mysore 3
Pune 4
dtype: int64
Delhi 1
Bangalore 2
Vizag 5
Pune 4
dtype: int64
To retrieve the information from the series, is similar to the python dictionary, pass on the index-label of the given data type. In the above example, the index-label is of type String.
要从系列中检索信息,类似于python字典,传递给定数据类型的index-label。 在上面的示例中,索引标签的类型为String。
print(ser1['Delhi'])
# Output: 1
Now let's trying adding the two series,
现在让我们尝试添加两个系列,
print(ser1+ser2)
'''
Output:
Bangalore 4.0
Delhi 2.0
Mysore NaN
Pune 8.0
Vizag NaN
dtype: float64
'''
The pandas, adds the values of the index-labels. In case the match is not found, it will be put a NaN (null value). When the operations are performed on series or any NumPy/Pandas based object, the integers will be converted to float.
pandas ,添加索引标签的值。 如果找不到匹配项,则将其放入NaN(空值)。 当对序列或任何基于NumPy / Pandas的对象执行操作时,整数将转换为float。
翻译自: https://www.includehelp.com/python/python-for-data-analysis-pandas.aspx