Numpy、Matplotlib、Pandas常用函数

Numpy

数组创建函数

array(): 创建数组
```
np.array([1, 2, 3])
```
arange(): 创建范围内的数组
```
np.arange(1, 10)
```
zeros(),ones(): 创建全0或全1数组
```
np.zeros((2， 3))
np.ones((3, 2))
```
empty(): 创建未初始化的数组
```
np.empty((3, 3))
```

linspace(), logspace(): 创建线性或者对数间隔的数组

np.linspace(1, 10, num=50)
np.linspace(1, 10, num=50, base=2)

fromfunction(), forfile(): 从函数或文件创建数组

np.fromfunction(lambda i, j(): i+j, (3, 3), dtype=int)
np.forfile('file.txt')

数组操作函数

reshape(), resize(): 改变数组形状
```
np.arange(6).reshape(2, 3)
```

concatenate(), vstack(), hstack(): 数组拼接

a = np.array([[1, 2, 3], [4, 5, 6]])
b = np.array([[7, 8, 9]])
np.concatenate([a, b], axis=0)
# axis=0，表示将拼接新行，相当于vstack；
# axis=1，表示将拼接新列，相当于hstack
np.hstack((a,b))
np.vstack((a,a))

spilt(), hsplit(), vsplit(): 数组分割
```
np.split(np.arange(9), 3)
```
flip(), fliplr(), flipud(): 数组翻转
```
np.flip(np.arange(4).reshape(2,3))
```
roll(): 数组元素滚动
```
np.roll(np.arange(10), 2)
```

take(), put(): 按索引取值和赋值

a = np.arange(10)
np.take(a, [2, 5])
np.put(a, [0, 3], [-10, -30])

diagonal(): 提取对角线元素
```
np.arange(9).reshape(3, 3).diagonal()
```

select(): 基于条件选择元素

conditions = [a < 3, a > 5]
choices = [a, a**2]
np.select(conditions, choices)

reval(), flatten(): 数组展平
```
np.arange(6).reshape(2, 3).ravel()
```
transpose(), swapaxes(): 转置和轴交换

np.arange(6).reshape(2, 3).transpose()

expand_dims(), squeeze(): 增加或减少维度
```
np.expand_dims(np.arange(3), axis=0)
```

数学函数

add, subtract, multiply, divide(): 基本数学运算
```
np.add(a, np.arange(5))
```
sin, cos, tan(): 三角函数
```
np.sin(np.linspace(0, np.pi, 4))
```
exp, log(): 指数和对数
```
np.exp(np.arange(5))
```
sum, mean, median, std, var(): 聚合函数
```
np.mean(np.arange(10))
```
sinh, cosh, tanh(): 双曲三角函数
```
np.sinh(np.linspace(-2, 2, 5))
```
erf, gamma(): 特殊函数
```
np.erf(np.linspace(-3, 3, 7))
```
pi, e: 数学常数
```
np.pi
np.e
```
fft(), ifft(): 快速傅里叶变换和逆变换
```
np.fft.fft(np.arange([0, 1, 2, 3]))
```
round(), around(), floor(), ceil(): 数值舍入和取整操作
```
np.round(np.linspace(0, 2, 5))
```

线性代数函数

dot(): 矩阵点乘

np.dot(np.eye(2), [1, 2]) # np.eye()对角线为1，其他为0

matmul()或@操作符: 矩阵乘法
```
np.matmul(np.eye(2), [1, 2])
```
inner(), outer(): 内积和外积
```
np.inner([1, 2], [3, 4])
```

linalg.inv(), linalg.pinv(): 矩阵求逆

np.linalg.inv(np.arange([[1, 2], [3, 4]]))

linalg.eig(), linalg.svd(): 特征值和奇异值分解
```
np.linalg.eig(np.arange([[1, 2], [3, 4]]))
```

linalg.solve(): 解线性方程组

np.linalg.solve([[1, 2], [3, 4]], [5, 6])

linalg.det(): 计算矩阵的行列式

np.linalg.det(np.array([[1, 2], [3, 4]]))

linalg. norm(): 计算矩阵或向量的范数
```
np.linalg.norm([3, 4])
```
linalg.matrix_rank(): 计算矩阵的秩
```
np.linalg.matrix_rank(np.eye(3))
```

linalg.qr(): QR分解

np.linalg.qr(np.array([[1, 2], [3, 4]]))

linalg.cholesky(): Cholesky分解

np.linalg.cholesky(np.array([[1, 1], [1, 2]]))

linalg.lstsq(): 最小二乘解

np.linalg.lstsp([[1, 2], [3, 4]], [5, 6], rcond=None)

统计函数

min(), max(): 最小值和最大值
```
np.min([1, 2, 3]), np.max([1, 2, 3])
```
argmin(), argmax(): 最小值和最大值的索引
```
np.argmin([1, 2, 3]), np.argmax([1, 2, 3])
```
percentile(), quantile(): 百分位数和分位数
```
np.percentile([1, 2, 3, 4, 5], 50)
```
gradient(), diff(): 计算数组的梯度和数值差分
```
np.grandient([1, 2, 4, 7, 11])
```
mgrid(), ogrid(): 用于生成坐标矩阵
```
np.mgrid([0:5, 0:5])
```
historgram(): 用于计算直方图
```
np.historgram([1, 2, 1, 3, 2, 1, 4])
```

mean(), median(): 计算平均值和中位数

np.mean([1, 2, 3])
np.median([1, 2, 3])

std(), var(): 计算标准差和方差
```
np.std([1, 2, 3])
np.var([1, 2, 3])
```

average(): 加权平均

np.average([1, 2, 3, 4], weight=[1, 2, 3, 4])

corrcoef(), cov(): 计算皮尔逊相关系数和协方差矩阵
```
np.corrcoef([1, 2, 3], [4, 5, 6])
```
nanmean(), nanstd(), nanvar(): 在忽略NaN的情况下计算平均值、标准差和方差
```
np.nanmean([1, np.nan, 3])
```

逻辑函数

all(), any(): 数组所有元素或任一元素的逻辑判断

np.all([True, True, False])
np.any([True, False, False])

logical_and(), logical_or(), logical_not(): 逻辑运算
```
np.logical_and([True, False], [False, True])
```

随机数生成

random.rand(), random.randn(): 均匀分布和正态分布的随机数
```
np.random.rand(5)
np.random.randn(5)
```
random.randint(): 随机整数
```
np.random.randint(1, 10)
```
random.choice(): 随机采样
```
np.random.choice([1, 2, 3], size=2)
```

数据类型操作

astype(): 转换数组数据类型
```
np.array([1.1, 2.2, 3.3]).astype(int)
```
isnan(), isfinite(), isinf(): 检测NaN、有限和无限值
```
np.isnan([1, np.nan, 3])
```
real(), imag(): 提取复数的实部和虚部
```
np.real(1 + 2j)
np.imag(1 + 2j)
```

副本和视图

copy(): 创建数组的副本
```
np.array([1, 2, 3]).copy()
```
视图: 浅拷贝，例如应该切片创建的数组视图
```
a = np.array([1, 2, 3])
b = a[:]
```

广播机制

广播: 自动扩展数组的维度以进行逐元素操作
```
np.array([1, 3, 5]) + 5
```

文件输入/输出

load(), save(): 读取和保存NumPy二进制文件
```
np.save('data.npy', np.array([1, 2, 3]))
```

loadtxt(), savetxt(): 读取和保存文本

np.savetxt(data.txt, np.array([1, 2, 3]))
np.loadtxt('data.txt')

多项式函数

polyval(), polyfit(): 计算多项式的值和拟合多项式
```
np.polyval([1, -2, 0, 2], 3)
```
polyadd(), polysub(), polymul(), polydiv(): 多项式的加、减、乘、除
```
np.polyadd([1, 1], [-1, 1])
```

集合操作

unique(): 找到唯一元素
```
np.unique([1, 1, 2, 2, 3, 3])
```
intersectld(), unionld(): 数组的交集和并集
```
np.intersectld([1, 2, 3], [2, 3, 4])
```
setdiffld(), setxorld(): 数组的差集和对称差集
```
np.setdiffld([1, 2, 3], [2, 3, 4])
```

排序和搜索

sort(): 数组排序
```
np.sort([3, 1, 2])
```
argsort(): 返回排序后的索引
```
np.argsort([3, 1, 2])
```
searchsorted(): 在有序数组中查找元素的索引
```
np.searchsorted([1, 2, 3, 4, 5], 3)
```

条件函数

where(): 根据条件返回数组中的元素或执行数组级别的if-else
```
np.where([True, True, True], [1, 2, 3], [4, 5, 6])
```

choose(): 使用索引数组从一数组中选择元素

np.choose([0, 1, 2, 1], [[1, 1, 1, 1],[2, 2, 2, 2],[3, 3, 3, 3]])

select(): 根据一系列条件选择数组中的元素

np.select([a < 3, a > 5], [a, a**2], default=-1)

填充和边界处理

pad(): 对数组进行填充

np.pad([1, 2, 3], (1, 2), 'constant', constant_values=(4, 6))

字符串处理

add(): 连接两个字符串数组
```
np.char.add(['hello'], ['world'])
```
multiply(): 重复字符串
```
np.char.multiply('hello', 3)
```
center(), ljust(), rjust(): 字符串居中、左对齐或右对齐
```
np.char.center('hello', 10, fillchar='-')
```

lower(), upper(): 转换为大写或小写

np.char.lower('HELLO')
np.char.upper('hello')

capitalize(), title(): 首字母大写或每个单词首字母大写
```
np.char.strip('hello world')
```
strip(), lstrip(), rstrip(): 去除空白字符
```
np.char.strip('  hello world')
```
split(), partition(): 根据分隔符分割字符串
```
np.char.split('hello world', ' ')
```
find(), index(): 查找子字符串
```
np.char.find('hello world', 'hello')
```

replace(): 替换字符串

np.char.replace('hello world', 'world', 'NumPy')

format(): 引用Python字符串格式化
```
np.char.format('hello %s', ['world'])
```

逻辑和比较操作

logical_xor(), greater(), less(): 执行逻辑异或、大于、小于等比较操作
```
np.logical_xor([True, False], [False, False])
```
bitwise_and(), bitwise_or(), bitwise_xor(): 对整数数组元素执行位运算
```
np.bitwise_and([1, 2], [3, 4])
```

Matplotlib

绘制函数

plot(): 绘制线图

import Matplotlib.pyplot as plt
plt.plot([1, 2, 3, 4])
plt.plabel('some numbers')
plt.show()

scatter(): 绘制散点图

x = np.random.rand(10)
y = np.random.rand(10)plt.scatter(x, y)
plt.show()

bar(), barh(): 绘制条形图

plt.bar([1, 2, 3], [4, 5, 6])
plt.show()

hist(): 绘制直方图
```
plt.hist([1, 2, 1, 3, 2, 1, 4])
```
pie(): 绘制饼图
```
plt.pie([10, 15, 30, 45])
```

boxplot(): 绘制箱线图

plt.boxplot([1, 2, 3, 4, 5])
plt.show()

contour(), contourf(): 绘制等高线图

x = np.arange(-3.0, 3.0, 0.1)
y = np.arange(-3.0, 3.0, 0.1)
X, Y = np.meshgrid(x, t)
Z = np.sign(X**2 + Y**2)
plt.show()

imshow(): 显示图像数据

plt.imshow(np.random.random((100, 100)))
plt.show()

pcolor(), pcolormesh(): 绘制伪彩色图

plt.pcolor(np.random.rand(50, 50))
plt.show()

stem(): 绘制茎叶图

plt.stem([1, 2, 3], [4, 5, 6])
plt.show()

step(): 绘制阶梯图

r = np.arange(0, 2, 0.01)
theta = 2 * np.pi * i
plt.polar(theta, r)
plt.show()

xcorr(), acorr(): 交叉相关和自相关图

x = np.random.randn(1000)
plt.xcorr(x, x, maxlags=50)
plt.show()

图形和坐标系设置

figure(): 创建一个新的图形
```
plt.figure(figsize=(8, 6))
```
subplot(), subplots(): 创建子图
```
fig, axs = plt.subplots(2, 2)
```

axes(): 添加新的坐标轴

fig, ax = plt.subplots()
ax = plt.axes([0, 0, 0.5, 0.5])

xlim(), ylim(): 设置坐标轴的范围
```
plt.xlim(0, 10)
ply.ylim(-1, 1)
```

xticks(), yticks(): 设置坐标轴的刻度

plt.xticks(np.arange(0, 10, 1))
plt.yticks(np.arange(-1, 1, 0.1))

xlabel(), ylabel(): 设置坐标轴的标签

plt.xlabel('X Axis')
plt.ylabel('Y Axis')

title(): 设置图形的标题
```
plt.title('Plot')
```
grid(): 添加网格线
```
plt.grid(True)
```
show(): 显示图形
```
plt.show()
```

样式和颜色

style(): 使用预定义的样式
```
plt.style.use('ggplot')
```
colorbar(): 添加颜色条
```
plt.colorbar()
```

colormaps(): 设置和管理颜色映射

plt.imshow(np.random.rand(10, 10), cmap='viridis')

text(): 在图形中添加文件
```
plt.text(0.5, 0.5, 'Hello World')
```

图标元素自定义

step(): 设置图形属性

lines = plt.plot([1, 2, 3])
plt.step(lines, color='r', linewidth=2.0)

gca(), gcf(): 获取当前轴(Axes)或图形(Figure)
```
fig = plt.gcf()
ax = plt.gca()
```
clf(), cla(): 清除当前图形或轴
```
plt.clf()
plt.cla()
```
draw(): 重绘当前图形
```
plt.draw()
```

legend(): 添加和自定义图例

plt.plot([1, 2, 3], lable='Line')
plt.legend()

tight_layout(): 自动调整子图参数以适应图形区域
```
plt.tight_layout()
```
savefig(): 以不同的格式保存图像
```
plt.savefig('my_plot.png')
```

图形格式和配置

rc(): 设置Matplotlib的配置参数

plt.rc('lines', linewidth=2, color='r')

rcParams(): 一个配置字典，用于控制Matplotlib的各种默认属性
```
plt.rcParams['lines.linewidth'] = 2
```
rcdefaults(): 恢复配置参数的默认设置
```
plt.rcdefaults()
```

annotate(): 在图形上添加注释

plt.annotate('Important', xy=(2, 1), xytex=(3, 1.5), arrowprops=dict(facecolor='black', shrink = 0.05))

arrow(): 绘制箭头
```
plt.arrow(0, 0, 0.5, 0.5)
```

路径和SVG处理

Path(): 创建图形路径

from matplotlib.path import Path
import matplotlib.patches as patches
verts = [(0., 0.),(0., 1.),(1., 1.),(1., 0.),(0., 0.)
]
codes = [Path.MOVETO, Path.LINETO,Path.LINETO,Path.LINETO,Path.CLOSEPLOY
]
path = Path(verts, codes)

patches(): 绘制和使用图形修饰，如矩形、圆形、箭头

fig, ax = plt.subplots()
path = patches.Circle((0.5, 0.5), 0.25, facecolor='yellow')
ax.add_patch(patch)

图像处理

imread(): 读取图像数据

import matplotlib.pyplot as plt
import matplotlib.image as mpimg
img = mpimg.imread('image.png')
plt.imshow(img)
plt.show()

imsave(): 保存图像数据
```
mpimg.imsave('out.png')
```

imrotate(), imresize(): 旋转和调整图像大小

from PIL import Image
img = Image.open('image.png')
img_rotated = img.rotate(45)
img_resized = img.resize((100, 100))

交互式工具

interactive(): 开启或关闭交互模式

plt.ion()  # 开启交互模式
plt.ioff() # 关闭交互模式

坐标轴和刻度定制

twinx(), twiny(): 为相同的x或y轴创建第二个坐标轴
```
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
```
tick_params(): 设置刻度的样式
```
ax1.tick_params(axis='x', rotation=45)
```
Axis(): 直接操作轴对象
```
ax = plt.gca()
ax.xaxis
```

Locator(), Formatter(): 控制刻度的位置和格式

from matplotlib.ticker import MaxNLocator, FuncFormatter
ax.xaxis.set_major_locator(MaxNLocator(nbins=5))

事件处理

ginput(): 获取用户点击的坐标
```
plt.ginput(n=2)
```
waitforbuttonpress(): 等待用户的按键或鼠标点击
```
plt.waitforbuttonpress()
```

connect(): 连接一个事件处理函数

def on_click(event):print(event.x, event.y)
fig.canvas.mpl_connect('button_action', on_click)

特殊图表类型

hexbin(): 六边形箱图

x_hexbin = np.random.rand(1000)
y_hexbin = np.random.rand(1000)
# 绘制六边形箱图
plt.hexbin(x_hexbin, y_hexbin, gridsize = 30, cmap='Blues')
plt.colorbar() # 添加颜色条
plt.show()

streamplot(): 流线图

Y, X = np.mgrid[-3:3:100j,- 3:3:100j]
U, V = -1 - X**2 + Y, 1 + X - Y**2
plt.streanplot(X, Y, U, V)
plt.show()

errorbar(): 错误条图

# 生成随机数据
x_errorbar = np.linspace(0, 10, 20)
y_errorbar = np.sin(x_errorbar)# 生成随机误差数据
error = np.random.rand(20) * 0.2# 绘制错误条图
plt.errorbar(x_errorbar, y_errorbar,yerr = error, fmt='o')
plt.show()

quiver(): 矢量场图

# 生成网格数据
x_quiver = np.linspace(-2, 2, 10)
y_quiver = np.linspace(-2, 2, 10)
X_quiver, Y_quiver = np.meshgrid(x_quiver, y_quiver)# 生成向量场数据
U_quiver = -Y_quiver # x 方向分量
V_quiver = X_quiver # y 方向分量# 绘制矢量场图
plt.quiver(X_quiver, Y_quiver, U_quiver, V_quiver)
plt.show()

几何图形和辅助线

axhline(), axvline(): 绘制水平或垂直线

plt.axhline(y=0.5, color='r')
plt.axvline(y=0.5, color='b')

hline(), vline(): 绘制一系列水平或垂直线
```
plt.hline(y=[0.2, 0.6], xmin=0, xmax=1)
```
fill(), fill_between(): 绘制填充区域
```
plt.fill_between(x, y1, y2)
```

集成其他库

Matplotlib与Pandas和Seaborn等库紧密集成，支持直接绘制DateFrame和Series

import pandas as pd
df = pd.DataFrame(np.random.rand(10, 2), columns=['A', 'B'])
df.plot(kind='bar')

颜色和颜色映射

color(): 控制颜色
```
plt.plot(x, y, color='red')
```
colormap(): 使用和创建颜色映射
```
plt.imshow(Z, cmap='hot')
```

图形窗口和用户界面

figure(): 管理和使用图形窗口
```
plt.figure(num='Figure')
```

subplots_adjust(): 调整子图的布局

plt.subplots_adjust(left=0.1, right=0.9, top=0.9, bottom=0.1)

三维绘图

Axes3D(): 用于创建三维图形的坐标轴

from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

plot3D(), scatter3D(): 绘制三维线图和散点图

# 生成3D数据
x_3d = np.linspace(-5, 5, 100)
y_3d = np.sin(x_3d)
z_3d = np.cos(x_3d)# 创建3D图形
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')ax.plot3d(x_3d, y_3d, z_3d, 'gray') # 3D线图
ax.scatter3d(x_3d, y_3d, z_3d, c=z_3d, cmap='virdis') # 3D散点图
plt.show()

contour3D(): 绘制三维等高线图

# 生成3D网格数据
x_contour = np.linspace(-5, 5, 50)
y_contour = np.linspace(-5, 5, 50)
X_contour, Y_contour = np.meshgrid(x_contour, y_contour)# 创建一个3D曲面
Z_contour = np.sin(np.sqrt(X_contour**2 + Y_contour ** 2))# 创建3D图形
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')# 绘制3D等高线图
ax.contour3D(X_contour, Y_contour,Z_contour, 50)plt.show()

surf(), plot_surface(): 绘制三维表面图

# 生成3D曲面的数据
x_surface = np.linspace(-5, 5, 50)
y_surface = np.linspace(-5, 5, 50)
X_surface, Y_surface = np.meshgrid(x_surface, y_surface)# 定义3D曲面
Z_surface = np.sin(np.sqrt(X_surface**2 + Y_surface**2))# 创建 3D图形
fig = plt.figure()
ax = fig.add_support(111, projection='3d')# 绘制3D表面图
ax.plot_surface(X_surface, Y_surface, Z_surface, cmap='virdis')
plt.show()

动画制作

animation(): 创建动画

from matplotlib import animation
def animate(i):line.set_ydate(np.sin(x + i/50))return line
ani = animation.FuncAnimation(fig, animate, frame=100, interval=20)

Pandas

读取和写入

read_csv(): 从CSV文件读取数据到DateFrame

import pandas as pd
df = pd.read_csv('file.csv')

read_excel(): 从Excel文件读取数据
```
df = pd.read_excel('file.xlsx')
```

read_sql(): 从SQL数据库读取数据

import sqlalchemy as sa# 建立数据库连接
engine = sa.create_engine('sqlite://mydb.db')# 读取'my_table'表格
df = pd.read_sql('my_table', engine)

to_csv(): 将DateFrame 数据写入CSV文件
```
df.to_csv('output.csv', index=False)
```
to_excel(): 将DateFrame数据写入Excel文件
```
df.to_excel('output.xlsv', index=False)
```

to_sql(): 将DataFrame数据写入SQL数据库

df.to_sql('my.table', engine, if_exists='replace', index=False）

read_json(), to_json(): 读取和写入JSON文件

# 读取JSON
df = pd.read_json(data.json)# 写入JSON
df = pd.to_json(output.json)

read_html(), to_html(): 读取HTML表格数据，写入数据到HTML

# 读取HTML
# 返回一个列表，每个元斋是一个DataFrame
list_of_df = pd.read_html('http://example.com/table.html')# 写入HTML
df.to_html('output.html')

read_parquet(), to_parquet(): 读取和写入Parquet文件，一种高效的列存储格式

# 读取Parquet
df = pd.read_parquet('data.parquet')# 写入Parquet
df.to_parquet('output.parquet')

数据探索和检查

head(): 查看DateFrame的前几行
```
df.head() # 默认前5行
```
tail(): 查看DateFrame的后几行
```
df.tail() # 默认后5行
```
info(): 获取DateFrame的概要信息
```
df.info()
```
describe(): 显示数据的统计摘要
```
df.describe()
```
dtypes: 查看每列的数据类型
```
df.types
```

数据清洗和预处理

dropna(): 删除含有缺失值的行或列
```
df.dropna() # 删除含有NaN的行
```
fillna(): 填充缺失值
```
df.fillna(value=0) # 用0填充NaN
```

drop(): 删除指定的行或列

df.drop(['column1', 'column2'], axis=1) # 删除指定列

rename(): 重命名DataFrame的索引或列名

df.rename(columns={'old name': 'new name'})

astype(): 转换列的数据类型
```
df['column'].astype('float')
```

数据筛选和索引

loc[]: 基于标签的索引
```
df.loc[df['column'] > 10]
```
iloc[]: 基于位置的索引
```
df.iloc[0:5] # 选择前5行
```
query(): 使用查询字符串来筛选数据
```
df.query('column > 10')
```

数据操作和转换

groupby(): 按照某些列进行分组
```
df.groupby('column').sum
```

pivot_table() 创建透视表

df.pivot_table(values='D', index=['A', 'B'], column=['C'])

merge(): 合并两个DataFrame
```
pd.merge(df1, df2, on='key')
```
concat(): 连接两个或多个DataFrame
```
pd.concat([df1, df2])
```
apply(): 对数据应用函数
```
df.apply(lambda x: x.max() - x.min())
```

时间序列分析

to_datetime() 将字符串转换为datetime对象
```
pd.to_datetime(df['column'])
```
resample() 对时间序列数据进行重采样
```
df.resample('M').mean()
```
rolling(): 应用滚动窗口计算
```
df.rolling(window=3).mean()
```

聚合和统计

sum(): 计算数值列的总和
```
df['column'].sum()
```
mean(): 计算数值列的平均值
```
df['column'].mean()
```
median(): 计算数值列的中位数
```
df['column'].median()
```
min(): 找出数值列的最小值
```
df['column'].min()
```
max(): 找出数值列的最大值
```
df['column'].max()
```
std(): 计算数值列的标准差
```
df['column'].std()
```
var(): 计算数值列的方差
```
df['column'].var()
```
count(): 计算非空值的数量
```
df['column'].count()
```

agg(): 使用一个或多个操作同时对一组数据进行聚合

df.agg({'column1': ['sum', 'min'],'column2': ['max', 'mean'],
})

数据转换

melt(): 将DataFrame从宽格式转换为长格式

pd.melt(df, id_vars=['A'], value_vars=['B', 'C'])

pivot(): 将数据从长格式转换为宽格式

df.pivot(index='date', columns='variable', values='value')

cut(): 将连续数据分割成离散的区间
```
pd.cut(df['column'], bins=3)
```
qcut(): 基于样本分位数来分割数据
```
pd.qcut(df['column'], q=4)
```

字符串处理

lower(): 将字符串转换为小写
```
df['column'].str.lower()
```
upper(): 将字符串转换为大写
```
df['column'].str.upper()
```
len(): 计算字符串长度
```
df['column'].str.len()
```
strip(): 删除字符串前后的空白字符
```
df['column'].str.dtrip()
```
lstrip(): 删除字符串左边的空白符
```
df['column'].str.lstrip()
```
rstrip(): 删除字符串右边的空白符
```
df['column'].str.rstrip()
```
contains(): 检查每个字符串是否包含特定模式/子字符串
```
df['column'].str.contains('pattern')
```
stratswith(): 检查字符串是否以特定子字符串开始
```
df['column'].str.stratswith('abc')
```
endswith(): 检查字符串是否以特定子字符串结束
```
df['column'].str.endswith('abc')
```
match(): 根据给定的正则表达式匹配字符串
```
df['column'].str.match(r'^pattern$')
```
replace(): 替换字符串中的某些部分
```
df['column'].str.replace('old', 'new')
```
dplit(): 根据分隔符拆分字符串为多个部分
```
df['column'].str.split(',')
```
join(): 将序列中的元素连接成字符串
```
df['column'].str.join('-')
```
cat(): 连接字符串(默认按行连接)
```
df['column'].str.cat(sep=', ')
```

正则表达式

extract(): 使用正则表达式从字符串中提取一部分
```
df['column'].str.extract(r'(pattern)')
```
findall(): 使用正则表达式查找字符串中的所有匹配项
```
df['column'].str.finall(r'pattern')
```

replace(): 使用正则表达式替换字符串中的部分内容

df['column'].str.replace(r'pattern', 'replacement', regex=True)

辅助判断与提取

isnumeric(): 检查每个字符串是否只含有数字
```
df['column'].str.isnumeric()
```
isdecimal(): 检查每个字符串是否只含有十进制数字
```
df['column'].str.isdecimal()
```
isalpha(): 检查每个字符串是否只包含字母
```
df['column'].str.isalpha()
```
isdigit(): 检查每个字符串是否只含有数字
```
df['column'].str.isdigit()
```
islower(): 检查每个字符串是否只含有小写字母
```
df['column'].str.islower()
```
isupper(): 检查每个字符串是否只含有大写字母
```
df['column'].str.isupper()
```
istitle(): 检查每个字符串是否标题化(每个单词首字母大写)
```
df['column'].str.istitle()
```

数据可视化

plot(): 默认绘制线图
```
df.plot()
```

plot.bar(), plot.barh(): 绘制条形图

df.plot.bar() # 垂直条形图
df.plot.barh() # 水平条形图

plot.hist(): 绘制直方图
```
df.plot.hist()
```
plot.box(): 绘制箱型图
```
df.plot.box()
```
plot.area(): 绘制面积图
```
df.plot.area()
```
plot.pie(): 绘制饼图
```
df.plot.pie(subplots=True)
```

plot.scatter(): 绘制散点图

df.plot.scatter(x='column1', y='column2')

处理重复数据

duplicated(): 检查重复行
```
df.duplicated()
```
drop_duplicates(): 删除重复行
```
df.drop_duplicates()
```

数据连接

concat(): 沿着一个轴将多个对象堆叠到一起
```
pd.concat([df1, df2])
```
merge(): 根据一个或多个键将不同DataFrame的行连接起来
```
pd.merge(df1, df2, on='key')
```
join(): 对索引进行连接(内连接、外连接、左连接、右连接)
```
df.join(df1, how='right')
```
append(): 将一行或多行附加到DataFrame上，相当于concat()的特例
```
df.append(df1)
```

对数据分组和拆箱

cut(): 将数据的数值分割成离散的区间
```
pd.cut(df['column', bins=3])
```
qcut(): 基于样本分位数来划分数据
```
pd.qcut(df['column'], q=4)
```

索引操作

set_index(): 将DataFrame的一列或多列设置为索引
```
df.set_index('column')
```
reset_index(): 重置DataFrame的索引，使之回到默认整数索引
```
df.reset_index()
```
swaplevel(): 交换索引的级别
```
df.swaplevel()
```
stack(): 将列旋转为行
```
df.stack()
```
unstack(): 将行旋转为列
```
df.unstack()
```

性能提升

eval(): 使用字符串表达式快速计算DataFrame的操作，有助于性能提升
```
pd.eval('df1 + df2')
```
query(): 使用字符串表达式对DataFrame进行查询
```
df.query('column > 10')
```

其他

unique(): 查找列中的唯一值
```
df['column'].unique()
```
value_counts(): 计算一列中各值出现的次数
```
df['column'].value_counts()
```
sort_values(): 按照一列或多列的值进行排序
```
df.sort_values(by='column')
```
isna(): 检测缺失值
```
df.isna()
```