50个最有用的Matplotlib数据分析与可视化图

本文介绍了数据分析与可视化中最有用的50个数据分析图,共分为7大类:Correlation、Deviation、RankIng、Distribution、Composition、Change、Groups

原文链接:https://www.machinelearningplus.com/plots/top-50-matplotlib-visualizations-the-master-plots-python/

目录

设置

Correlation

1.Scatter plot(散点图)

2.Bubble plot with Encircling(包围的气泡图)

3. Scatter plot with linear regression line of best fit(散点图与最佳线性拟合回归线)

4.Jittering with stripplot(带条纹的抖动)

5. Counts Plot(计数图)

6. Marginal Histogram(边缘直方图)

7. Marginal Boxplot(边缘箱线图)

8. Correllogram(相关图)

9. Pairwise Plot(成对图)

Deviation

10. Diverging Bars(发散条形图)

11. Diverging Texts(发散文本)

12. Diverging Dot Plot(散点图)

13. Diverging Lollipop Chart with Markers(带标记的发散型棒棒糖图)

14. Area Chart(面积图)

Ranking

15. Ordered Bar Chart(有序条形图)

16. Lollipop Chart(棒棒糖图)

17. Dot Plot(点图)

18. Slope Chart(坡度图)

19. Dumbbell Plot(哑铃图)

Distribution

20. Histogram for Continuous Variable(连续变量的直方图)

21. Histogram for Categorical Variable(类型变量的直方图)

22. Density Plot(密度图)

23. Density Curves with Histogram(直方密度图)

24. Joy Plot

25. Distributed Dot Plot(分布式点图)

26. Box Plot(箱形图)

27. Dot + Box Plot(点+箱型图)

28. Violin Plot(小提琴图)

29. Population Pyramid(人口金字塔)

30. Categorical Plots(分类图)

Composition

31. Waffle Chart(华夫饼表)

32. Pie Chart(饼状图)

33. Treemap(树状图)

34. Bar Chart(条形图)

Change

35. Time Series Plot(时间序列图)

36. Time Series with Peaks and Troughs Annotated(带波峰波谷标记的时序图)

37. Autocorrelation (ACF) and Partial Autocorrelation (PACF) Plot(自相关和部分自相关图)

38. Cross Correlation plot(交叉相关图)

39. Time Series Decomposition Plot(时间序列分解图)

40. Multiple Time Series(多时间序列)

41. Plotting with different scales using secondary Y axis(使用辅助Y轴来绘制不同范围的图形)

42. Time Series with Error Bands(带有误差带的时间序列)

43. Stacked Area Chart(堆积面积图)

44. Area Chart UnStacked(未堆积的面积图)

45. Calendar Heat Map(日历热力图)

46. Seasonal Plot(季度图)

Groups

47. Dendrogram(树状图)

48. Cluster Plot(簇状图)

49. Andrews Curve(安德鲁斯曲线)

50. Parallel Coordinates(平行坐标)



设置

在运行具体画图代码前,先运行以下代码,导入画图所需的库,以及进行一些必要的参数设置。

import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import warnings; warnings.filterwarnings(action='once')large = 22; med = 16; small = 12
params = {'axes.titlesize': large,'legend.fontsize': med,'figure.figsize': (16, 10),'axes.labelsize': med,'axes.titlesize': med,'xtick.labelsize': med,'ytick.labelsize': med,'figure.titlesize': large}
plt.rcParams.update(params)
plt.style.use('seaborn-whitegrid')
sns.set_style("white")
%matplotlib inline# Version
print(mpl.__version__)  #> 3.0.0
print(sns.__version__)  #> 0.9.0

Correlation

用于可视化两个或多个变量之间的相互关系,当一个变量发生变化时,另一个变量与之如何变化。

1.Scatter plot(散点图)

散点图是研究两个变量之间最基本和经典的关系图。如果数据中有多个不同的组,则可能需要以不同的颜色显示每个组。在matplotlib中,可以使用plt.scatterplot()

# Import dataset 
midwest = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/midwest_filter.csv")# Prepare Data 
# Create as many colors as there are unique midwest['category']
categories = np.unique(midwest['category'])
colors = [plt.cm.tab10(i/float(len(categories)-1)) for i in range(len(categories))]# Draw Plot for Each Category
plt.figure(figsize=(16, 10), dpi= 80, facecolor='w', edgecolor='k')for i, category in enumerate(categories):plt.scatter('area', 'poptotal', data=midwest.loc[midwest.category==category, :], s=20, c=colors[i], label=str(category))# Decorations
plt.gca().set(xlim=(0.0, 0.1), ylim=(0, 90000),xlabel='Area', ylabel='Population')plt.xticks(fontsize=12); plt.yticks(fontsize=12)
plt.title("Scatterplot of Midwest Area vs Population", fontsize=22)
plt.legend(fontsize=12)    
plt.show()    


2.Bubble plot with Encircling(包围的气泡图)

有时你想在一个边界内显示一组点来强调它们的重要性。

from matplotlib import patches
from scipy.spatial import ConvexHull
import warnings; warnings.simplefilter('ignore')
sns.set_style("white")# Step 1: Prepare Data
midwest = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/midwest_filter.csv")# As many colors as there are unique midwest['category']
categories = np.unique(midwest['category'])
colors = [plt.cm.tab10(i/float(len(categories)-1)) for i in range(len(categories))]# Step 2: Draw Scatterplot with unique color for each category
fig = plt.figure(figsize=(16, 10), dpi= 80, facecolor='w', edgecolor='k')    for i, category in enumerate(categories):plt.scatter('area', 'poptotal', data=midwest.loc[midwest.category==category, :], \ s='dot_size', c=colors[i], label=str(category), edgecolors='black', linewidths=.5)# Step 3: Encircling
# https://stackoverflow.com/questions/44575681/how-do-i-encircle-different-data-sets-in-scatter-plot
def encircle(x,y, ax=None, **kw):if not ax: ax=plt.gca()p = np.c_[x,y]hull = ConvexHull(p)poly = plt.Polygon(p[hull.vertices,:], **kw)ax.add_patch(poly)# Select data to be encircled
midwest_encircle_data = midwest.loc[midwest.state=='IN', :]                         # Draw polygon surrounding vertices    
encircle(midwest_encircle_data.area, midwest_encircle_data.poptotal, ec="k", fc="gold", alpha=0.1)
encircle(midwest_encircle_data.area, midwest_encircle_data.poptotal, ec="firebrick", fc="none", linewidth=1.5)# Step 4: Decorations
plt.gca().set(xlim=(0.0, 0.1), ylim=(0, 90000),xlabel='Area', ylabel='Population')plt.xticks(fontsize=12); plt.yticks(fontsize=12)
plt.title("Bubble Plot with Encircling", fontsize=22)
plt.legend(fontsize=12)    
plt.show()    


3. Scatter plot with linear regression line of best fit(散点图与最佳线性拟合回归线)

如果想了解两个变量之间是如何变化的,那么最好的方法就是绘制一条拟合线。下图显示了数据中不同组之间的最佳拟合线的差异。

# Import Data
df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")
df_select = df.loc[df.cyl.isin([4,8]), :]# Plot
sns.set_style("white")
gridobj = sns.lmplot(x="displ", y="hwy", hue="cyl", data=df_select, height=7, aspect=1.6, robust=True, palette='tab10', scatter_kws=dict(s=60, linewidths=.7, edgecolors='black'))# Decorations
gridobj.set(xlim=(0.5, 7.5), ylim=(0, 50))
plt.title("Scatterplot with line of best fit grouped by number of cylinders", fontsize=20)
plt.show()


4.Jittering with stripplot(带条纹的抖动)

通常,多个数据点具有完全相同的X和Y值。因此绘制这些点时会相互覆盖,为避免这种情况,可以对其稍微抖动,以便可以直观地看到它们。使用seaborn.stripplot()函数很方便。

# Import Data
df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")# Draw Stripplot
fig, ax = plt.subplots(figsize=(16,10), dpi= 80)    
sns.stripplot(df.cty, df.hwy, jitter=0.25, size=8, ax=ax, linewidth=.5)# Decorations
plt.title('Use jittered plots to avoid overlapping of points', fontsize=22)
plt.show()


5. Counts Plot(计数图)

另一个避免数据点相互重叠的方法是改变数据点的大小,这取决于该图中有多少个点。

# Import Data
df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")
df_counts = df.groupby(['hwy', 'cty']).size().reset_index(name='counts')# Draw Stripplot
fig, ax = plt.subplots(figsize=(16,10), dpi= 80)    
sns.stripplot(df_counts.cty, df_counts.hwy, size=df_counts.counts*2, ax=ax)# Decorations
plt.title('Counts Plot - Size of circle is bigger as more points overlap', fontsize=22)
plt.show()


6. Marginal Histogram(边缘直方图)

边缘直方图是一个有着沿X和Y轴变量的直方图。这用于可视化X和Y之间的关系,同时也显示出X和Y各自的分布情况。

# Import Data
df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")# Create Fig and gridspec
fig = plt.figure(figsize=(16, 10), dpi= 80)
grid = plt.GridSpec(4, 4, hspace=0.5, wspace=0.2)# Define the axes
ax_main = fig.add_subplot(grid[:-1, :-1])
ax_right = fig.add_subplot(grid[:-1, -1], xticklabels=[], yticklabels=[])
ax_bottom = fig.add_subplot(grid[-1, 0:-1], xticklabels=[], yticklabels=[])# Scatterplot on main ax
ax_main.scatter('displ', 'hwy', s=df.cty*4, c=df.manufacturer.astype('category').cat.codes, alpha=.9, data=df, \ 
cmap="tab10", edgecolors='gray', linewidths=.5)# histogram on the right
ax_bottom.hist(df.displ, 40, histtype='stepfilled', orientation='vertical', color='deeppink')
ax_bottom.invert_yaxis()# histogram in the bottom
ax_right.hist(df.hwy, 40, histtype='stepfilled', orientation='horizontal', color='deeppink')# Decorations
ax_main.set(title='Scatterplot with Histograms \n displ vs hwy', xlabel='displ', ylabel='hwy')
ax_main.title.set_fontsize(20)
for item in ([ax_main.xaxis.label, ax_main.yaxis.label] + ax_main.get_xticklabels() + ax_main.get_yticklabels()):item.set_fontsize(14)xlabels = ax_main.get_xticks().tolist()
ax_main.set_xticklabels(xlabels)
plt.show()


7. Marginal Boxplot(边缘箱线图)

边缘箱图与边缘直方图具有相似的用途。然而,箱线图有助于精确定位X和Y的中位数,25和75百分位数。

# Import Data
df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")# Create Fig and gridspec
fig = plt.figure(figsize=(16, 10), dpi= 80)
grid = plt.GridSpec(4, 4, hspace=0.5, wspace=0.2)# Define the axes
ax_main = fig.add_subplot(grid[:-1, :-1])
ax_right = fig.add_subplot(grid[:-1, -1], xticklabels=[], yticklabels=[])
ax_bottom = fig.add_subplot(grid[-1, 0:-1], xticklabels=[], yticklabels=[])# Scatterplot on main ax
ax_main.scatter('displ', 'hwy', s=df.cty*5,                                                                                    c=df.manufacturer.astype('category').cat.codes, alpha=.9, data=df, cmap="Set1", edgecolors='black', linewidths=.5)# Add a graph in each part
sns.boxplot(df.hwy, ax=ax_right, orient="v")
sns.boxplot(df.displ, ax=ax_bottom, orient="h")# Decorations ------------------
# Remove x axis name for the boxplot
ax_bottom.set(xlabel='')
ax_right.set(ylabel='')# Main Title, Xlabel and YLabel
ax_main.set(title='Scatterplot with Histograms \n displ vs hwy', xlabel='displ', ylabel='hwy')# Set font size of different components
ax_main.title.set_fontsize(20)
for item in ([ax_main.xaxis.label, ax_main.yaxis.label] + ax_main.get_xticklabels() + ax_main.get_yticklabels()):item.set_fontsize(14)plt.show()


8. Correllogram(相关图)

Correlogram用于直观地查看给定数据帧(或2D数组)中所有可能的数值变量对之间的相关度量。

# Import Dataset
df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mtcars.csv")# Plot
plt.figure(figsize=(12,10), dpi= 80)
sns.heatmap(df.corr(), xticklabels=df.corr().columns, yticklabels=df.corr().columns,cmap='RdYlGn', center=0, annot=True)# Decorations
plt.title('Correlogram of mtcars', fontsize=22)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.show()


9. Pairwise Plot(成对图)

成对图用以理解所有可能的数字变量对之间的关系,它是双变量分析的必备工具。

# Load Dataset
df = sns.load_dataset('iris')# Plot
plt.figure(figsize=(10,8), dpi= 80)
sns.pairplot(df, kind="scatter", hue="species", plot_kws=dict(s=80, edgecolor="white", linewidth=2.5))
plt.show()

# Load Dataset
df = sns.load_dataset('iris')# Plot
plt.figure(figsize=(10,8), dpi= 80)
sns.pairplot(df, kind="reg", hue="species")
plt.show()


Deviation

10. Diverging Bars(发散条形图)

如果想根据单个指标查看条目的变化情况,并可视化此差异的顺序和数量,那么发散条形图是一个很好的工具。

# Prepare Data
df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mtcars.csv")
x = df.loc[:, ['mpg']]
df['mpg_z'] = (x - x.mean())/x.std()
df['colors'] = ['red' if x < 0 else 'green' for x in df['mpg_z']]
df.sort_values('mpg_z', inplace=True)
df.reset_index(inplace=True)# Draw plot
plt.figure(figsize=(14,10), dpi= 80)
plt.hlines(y=df.index, xmin=0, xmax=df.mpg_z, color=df.colors, alpha=0.4, linewidth=5)# Decorations
plt.gca().set(ylabel='$Model$', xlabel='$Mileage$')
plt.yticks(df.index, df.cars, fontsize=12)
plt.title('Diverging Bars of Car Mileage', fontdict={'size':20})
plt.grid(linestyle='--', alpha=0.5)
plt.show()


11. Diverging Texts(发散文本)

发散文本类似于发散条形图,如果想以一种漂亮和可呈现的方式显示图表中每个条目的数值。

# Prepare Data
df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mtcars.csv")
x = df.loc[:, ['mpg']]
df['mpg_z'] = (x - x.mean())/x.std()
df['colors'] = ['red' if x < 0 else 'green' for x in df['mpg_z']]
df.sort_values('mpg_z', inplace=True)
df.reset_index(inplace=True)# Draw plot
plt.figure(figsize=(14,14), dpi= 80)
plt.hlines(y=df.index, xmin=0, xmax=df.mpg_z)
for x, y, tex in zip(df.mpg_z, df.index, df.mpg_z):t = plt.text(x, y, round(tex, 2), horizontalalignment='right' if x < 0 else 'left', verticalalignment='center', fontdict={'color':'red' if x < 0 else 'green', 'size':14})# Decorations    
plt.yticks(df.index, df.cars, fontsize=12)
plt.title('Diverging Text Bars of Car Mileage', fontdict={'size':20})
plt.grid(linestyle='--', alpha=0.5)
plt.xlim(-2.5, 2.5)
plt.show()


12. Diverging Dot Plot(散点图)

# Prepare Data
df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mtcars.csv")
x = df.loc[:, ['mpg']]
df['mpg_z'] = (x - x.mean())/x.std()
df['colors'] = ['red' if x < 0 else 'darkgreen' for x in df['mpg_z']]
df.sort_values('mpg_z', inplace=True)
df.reset_index(inplace=True)# Draw plot
plt.figure(figsize=(14,16), dpi= 80)
plt.scatter(df.mpg_z, df.index, s=450, alpha=.6, color=df.colors)
for x, y, tex in zip(df.mpg_z, df.index, df.mpg_z):t = plt.text(x, y, round(tex, 1), horizontalalignment='center', verticalalignment='center', fontdict={'color':'white'})# Decorations
# Lighten borders
plt.gca().spines["top"].set_alpha(.3)
plt.gca().spines["bottom"].set_alpha(.3)
plt.gca().spines["right"].set_alpha(.3)
plt.gca().spines["left"].set_alpha(.3)plt.yticks(df.index, df.cars)
plt.title('Diverging Dotplot of Car Mileage', fontdict={'size':20})
plt.xlabel('$Mileage$')
plt.grid(linestyle='--', alpha=0.5)
plt.xlim(-2.5, 2.5)
plt.show()


13. Diverging Lollipop Chart with Markers(带标记的发散型棒棒糖图

带标记的棒棒糖可以帮助强调想要引起注意的任何重要数据点。

# Prepare Data
df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mtcars.csv")
x = df.loc[:, ['mpg']]
df['mpg_z'] = (x - x.mean())/x.std()
df['colors'] = 'black'# color fiat differently
df.loc[df.cars == 'Fiat X1-9', 'colors'] = 'darkorange'
df.sort_values('mpg_z', inplace=True)
df.reset_index(inplace=True)# Draw plot
import matplotlib.patches as patchesplt.figure(figsize=(14,16), dpi= 80)
plt.hlines(y=df.index, xmin=0, xmax=df.mpg_z, color=df.colors, alpha=0.4, linewidth=1)
plt.scatter(df.mpg_z, df.index, color=df.colors, s=[600 if x == 'Fiat X1-9' else 300 for x in df.cars], alpha=0.6)
plt.yticks(df.index, df.cars)
plt.xticks(fontsize=12)# Annotate
plt.annotate('Mercedes Models', xy=(0.0, 11.0), xytext=(1.0, 11), xycoords='data', fontsize=15, ha='center', va='center',bbox=dict(boxstyle='square', fc='firebrick'),arrowprops=dict(arrowstyle='-[, widthB=2.0, lengthB=1.5', lw=2.0, color='steelblue'), color='white')# Add Patches
p1 = patches.Rectangle((-2.0, -1), width=.3, height=3, alpha=.2, facecolor='red')
p2 = patches.Rectangle((1.5, 27), width=.8, height=5, alpha=.2, facecolor='green')
plt.gca().add_patch(p1)
plt.gca().add_patch(p2)# Decorate
plt.title('Diverging Bars of Car Mileage', fontdict={'size':20})
plt.grid(linestyle='--', alpha=0.5)
plt.show()


14. Area Chart(面积图)

通过对坐标轴和曲线之间的区域进行着色,区域图不仅强调波峰值和低波,而且还强调波峰和波谷的持续时间。波峰持续时间越长,面积越大。

import numpy as np
import pandas as pd# Prepare Data
df = pd.read_csv("https://github.com/selva86/datasets/raw/master/economics.csv", parse_dates=['date']).head(100)
x = np.arange(df.shape[0])
y_returns = (df.psavert.diff().fillna(0)/df.psavert.shift(1)).fillna(0) * 100# Plot
plt.figure(figsize=(16,10), dpi= 80)
plt.fill_between(x[1:], y_returns[1:], 0, where=y_returns[1:] >= 0, facecolor='green', interpolate=True, alpha=0.7)
plt.fill_between(x[1:], y_returns[1:], 0, where=y_returns[1:] <= 0, facecolor='red', interpolate=True, alpha=0.7)# Annotate
plt.annotate('Peak \n1975', xy=(94.0, 21.0), xytext=(88.0, 28),bbox=dict(boxstyle='square', fc='firebrick'),arrowprops=dict(facecolor='steelblue', shrink=0.05), fontsize=15, color='white')# Decorations
xtickvals = [str(m)[:3].upper()+"-"+str(y) for y,m in zip(df.date.dt.year, df.date.dt.month_name())]
plt.gca().set_xticks(x[::6])
plt.gca().set_xticklabels(xtickvals[::6], rotation=90, fontdict={'horizontalalignment':               'center', 'verticalalignment': 'center_baseline'})
plt.ylim(-35,35)
plt.xlim(1,100)
plt.title("Month Economics Return %", fontsize=22)
plt.ylabel('Monthly returns %')
plt.grid(alpha=0.5)
plt.show()


Ranking

15. Ordered Bar Chart(有序条形图)

有序条形图有效地传达了条目的排名顺序。但是,在图表上方添加度量标准的值,用户可以从图表本身获取精确信息。

# Prepare Data
df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")
df = df_raw[['cty', 'manufacturer']].groupby('manufacturer').apply(lambda x: x.mean())
df.sort_values('cty', inplace=True)
df.reset_index(inplace=True)# Draw plot
import matplotlib.patches as patchesfig, ax = plt.subplots(figsize=(16,10), facecolor='white', dpi= 80)
ax.vlines(x=df.index, ymin=0, ymax=df.cty, color='firebrick', alpha=0.7, linewidth=20)# Annotate Text
for i, cty in enumerate(df.cty):ax.text(i, cty+0.5, round(cty, 1), horizontalalignment='center')# Title, Label, Ticks and Ylim
ax.set_title('Bar Chart for Highway Mileage', fontdict={'size':22})
ax.set(ylabel='Miles Per Gallon', ylim=(0, 30))
plt.xticks(df.index, df.manufacturer.str.upper(), rotation=60, horizontalalignment='right', fontsize=12)# Add patches to color the X axis labels
p1 = patches.Rectangle((.57, -0.005), width=.33, height=.13, alpha=.1, facecolor='green', transform=fig.transFigure)
p2 = patches.Rectangle((.124, -0.005), width=.446, height=.13, alpha=.1, facecolor='red', transform=fig.transFigure)
fig.add_artist(p1)
fig.add_artist(p2)
plt.show()


16. Lollipop Chart(棒棒糖图)

棒棒糖图表以一种视觉上令人愉悦的方式提供与有序条形图类似的目的。

# Prepare Data
df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")
df = df_raw[['cty', 'manufacturer']].groupby('manufacturer').apply(lambda x: x.mean())
df.sort_values('cty', inplace=True)
df.reset_index(inplace=True)# Draw plot
fig, ax = plt.subplots(figsize=(16,10), dpi= 80)
ax.vlines(x=df.index, ymin=0, ymax=df.cty, color='firebrick', alpha=0.7, linewidth=2)
ax.scatter(x=df.index, y=df.cty, s=75, color='firebrick', alpha=0.7)# Title, Label, Ticks and Ylim
ax.set_title('Lollipop Chart for Highway Mileage', fontdict={'size':22})
ax.set_ylabel('Miles Per Gallon')
ax.set_xticks(df.index)
ax.set_xticklabels(df.manufacturer.str.upper(), rotation=60, fontdict={'horizontalalignment': 'right', 'size':12})
ax.set_ylim(0, 30)# Annotate
for row in df.itertuples():ax.text(row.Index, row.cty+.5, s=round(row.cty, 2), horizontalalignment= 'center', verticalalignment='bottom', fontsize=14)plt.show()


17. Dot Plot(点图)

点图传达了条目的排名顺序。由于它沿水平轴对齐,因此可以更容易地看到点彼此之间的距离。

# Prepare Data
df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")
df = df_raw[['cty', 'manufacturer']].groupby('manufacturer').apply(lambda x: x.mean())
df.sort_values('cty', inplace=True)
df.reset_index(inplace=True)# Draw plot
fig, ax = plt.subplots(figsize=(16,10), dpi= 80)
ax.hlines(y=df.index, xmin=11, xmax=26, color='gray', alpha=0.7, linewidth=1, linestyles='dashdot')
ax.scatter(y=df.index, x=df.cty, s=75, color='firebrick', alpha=0.7)# Title, Label, Ticks and Ylim
ax.set_title('Dot Plot for Highway Mileage', fontdict={'size':22})
ax.set_xlabel('Miles Per Gallon')
ax.set_yticks(df.index)
ax.set_yticklabels(df.manufacturer.str.title(), fontdict={'horizontalalignment': 'right'})
ax.set_xlim(10, 27)
plt.show()


18. Slope Chart(坡度图)

坡度图最适合比较给定项目的“在此之前”和“在此之后”的位置。

import matplotlib.lines as mlines
# Import Data
df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/gdppercap.csv")left_label = [str(c) + ', '+ str(round(y)) for c, y in zip(df.continent, df['1952'])]
right_label = [str(c) + ', '+ str(round(y)) for c, y in zip(df.continent, df['1957'])]
klass = ['red' if (y1-y2) < 0 else 'green' for y1, y2 in zip(df['1952'], df['1957'])]# draw line
# https://stackoverflow.com/questions/36470343/how-to-draw-a-line-with-matplotlib/36479941
def newline(p1, p2, color='black'):ax = plt.gca()l = mlines.Line2D([p1[0],p2[0]], [p1[1],p2[1]], color='red' if p1[1]-p2[1] > 0 else 'green', marker='o', markersize=6)ax.add_line(l)return lfig, ax = plt.subplots(1,1,figsize=(14,14), dpi= 80)# Vertical Lines
ax.vlines(x=1, ymin=500, ymax=13000, color='black', alpha=0.7, linewidth=1, linestyles='dotted')
ax.vlines(x=3, ymin=500, ymax=13000, color='black', alpha=0.7, linewidth=1, linestyles='dotted')# Points
ax.scatter(y=df['1952'], x=np.repeat(1, df.shape[0]), s=10, color='black', alpha=0.7)
ax.scatter(y=df['1957'], x=np.repeat(3, df.shape[0]), s=10, color='black', alpha=0.7)# Line Segmentsand Annotation
for p1, p2, c in zip(df['1952'], df['1957'], df['continent']):newline([1,p1], [3,p2])ax.text(1-0.05, p1, c + ', ' + str(round(p1)), horizontalalignment='right', verticalalignment='center', fontdict={'size':14})ax.text(3+0.05, p2, c + ', ' + str(round(p2)), horizontalalignment='left', verticalalignment='center', fontdict={'size':14})# 'Before' and 'After' Annotations
ax.text(1-0.05, 13000, 'BEFORE', horizontalalignment='right', verticalalignment='center', fontdict={'size':18, 'weight':700})
ax.text(3+0.05, 13000, 'AFTER', horizontalalignment='left', verticalalignment='center', fontdict={'size':18, 'weight':700})# Decoration
ax.set_title("Slopechart: Comparing GDP Per Capita between 1952 vs 1957", fontdict={'size':22})
ax.set(xlim=(0,4), ylim=(0,14000), ylabel='Mean GDP Per Capita')
ax.set_xticks([1,3])
ax.set_xticklabels(["1952", "1957"])
plt.yticks(np.arange(500, 13000, 2000), fontsize=12)# Lighten borders
plt.gca().spines["top"].set_alpha(.0)
plt.gca().spines["bottom"].set_alpha(.0)
plt.gca().spines["right"].set_alpha(.0)
plt.gca().spines["left"].set_alpha(.0)
plt.show()


19. Dumbbell Plot(哑铃图

哑铃图传达各种项目的“前”和“后”位置以及项目的排序。

import matplotlib.lines as mlines# Import Data
df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/health.csv")
df.sort_values('pct_2014', inplace=True)
df.reset_index(inplace=True)# Func to draw line segment
def newline(p1, p2, color='black'):ax = plt.gca()l = mlines.Line2D([p1[0],p2[0]], [p1[1],p2[1]], color='skyblue')ax.add_line(l)return l# Figure and Axes
fig, ax = plt.subplots(1,1,figsize=(14,14), facecolor='#f7f7f7', dpi= 80)# Vertical Lines
ax.vlines(x=.05, ymin=0, ymax=26, color='black', alpha=1, linewidth=1, linestyles='dotted')
ax.vlines(x=.10, ymin=0, ymax=26, color='black', alpha=1, linewidth=1, linestyles='dotted')
ax.vlines(x=.15, ymin=0, ymax=26, color='black', alpha=1, linewidth=1, linestyles='dotted')
ax.vlines(x=.20, ymin=0, ymax=26, color='black', alpha=1, linewidth=1, linestyles='dotted')# Points
ax.scatter(y=df['index'], x=df['pct_2013'], s=50, color='#0e668b', alpha=0.7)
ax.scatter(y=df['index'], x=df['pct_2014'], s=50, color='#a3c4dc', alpha=0.7)# Line Segments
for i, p1, p2 in zip(df['index'], df['pct_2013'], df['pct_2014']):newline([p1, i], [p2, i])# Decoration
ax.set_facecolor('#f7f7f7')
ax.set_title("Dumbell Chart: Pct Change - 2013 vs 2014", fontdict={'size':22})
ax.set(xlim=(0,.25), ylim=(-1, 27), ylabel='Mean GDP Per Capita')
ax.set_xticks([.05, .1, .15, .20])
ax.set_xticklabels(['5%', '15%', '20%', '25%'])
ax.set_xticklabels(['5%', '15%', '20%', '25%'])    
plt.show()


Distribution

20. Histogram for Continuous Variable(连续变量的直方图

直方图显示给定变量的频率分布。下面表示基于分类变量对频率条进行分组,从而更好地了解连续变量和串联变量。

# Import Data
df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")# Prepare data
x_var = 'displ'
groupby_var = 'class'
df_agg = df.loc[:, [x_var, groupby_var]].groupby(groupby_var)
vals = [df[x_var].values.tolist() for i, df in df_agg]# Draw
plt.figure(figsize=(16,9), dpi= 80)
colors = [plt.cm.Spectral(i/float(len(vals)-1)) for i in range(len(vals))]
n, bins, patches = plt.hist(vals, 30, stacked=True, density=False, color=colors[:len(vals)])# Decoration
plt.legend({group:col for group, col in zip(np.unique(df[groupby_var]).tolist(), colors[:len(vals)])})
plt.title(f"Stacked Histogram of ${x_var}$ colored by ${groupby_var}$", fontsize=22)
plt.xlabel(x_var)
plt.ylabel("Frequency")
plt.ylim(0, 25)
plt.xticks(ticks=bins[::3], labels=[round(b,1) for b in bins[::3]])
plt.show()


21. Histogram for Categorical Variable(类型变量的直方图

分类变量的直方图显示该变量的频率分布。通过对条形图进行着色,您可以将分布与表示颜色的另一个分类变量相关联。

# Import Data
df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")# Prepare data
x_var = 'manufacturer'
groupby_var = 'class'
df_agg = df.loc[:, [x_var, groupby_var]].groupby(groupby_var)
vals = [df[x_var].values.tolist() for i, df in df_agg]# Draw
plt.figure(figsize=(16,9), dpi= 80)
colors = [plt.cm.Spectral(i/float(len(vals)-1)) for i in range(len(vals))]
n, bins, patches = plt.hist(vals, df[x_var].unique().__len__(), stacked=True, density=False, color=colors[:len(vals)])# Decoration
plt.legend({group:col for group, col in zip(np.unique(df[groupby_var]).tolist(), colors[:len(vals)])})
plt.title(f"Stacked Histogram of ${x_var}$ colored by ${groupby_var}$", fontsize=22)
plt.xlabel(x_var)
plt.ylabel("Frequency")
plt.ylim(0, 40)
plt.xticks(ticks=bins, labels=np.unique(df[x_var]).tolist(), rotation=90, horizontalalignment='left')
plt.show()


22. Density Plot(密度图)

密度图是一种常用工具,可视化连续变量的分布。

# Import Data
df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")# Draw Plot
plt.figure(figsize=(16,10), dpi= 80)
sns.kdeplot(df.loc[df['cyl'] == 4, "cty"], shade=True, color="g", label="Cyl=4", alpha=.7)
sns.kdeplot(df.loc[df['cyl'] == 5, "cty"], shade=True, color="deeppink", label="Cyl=5", alpha=.7)
sns.kdeplot(df.loc[df['cyl'] == 6, "cty"], shade=True, color="dodgerblue", label="Cyl=6", alpha=.7)
sns.kdeplot(df.loc[df['cyl'] == 8, "cty"], shade=True, color="orange", label="Cyl=8", alpha=.7)# Decoration
plt.title('Density Plot of City Mileage by n_Cylinders', fontsize=22)
plt.legend()
plt.show()


23. Density Curves with Histogram(直方密度图

带有直方图的密度曲线将两个图表传达的集体信息汇集在一起,这样您就可以将它们放在一个图形而不是两个图形中。

# Import Data
df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")# Draw Plot
plt.figure(figsize=(13,10), dpi= 80)
sns.distplot(df.loc[df['class'] == 'compact', "cty"], color="dodgerblue", label="Compact",                         hist_kws={'alpha':.7}, kde_kws={'linewidth':3})
sns.distplot(df.loc[df['class'] == 'suv', "cty"], color="orange", label="SUV", hist_kws={'alpha':.7}, kde_kws={'linewidth':3})
sns.distplot(df.loc[df['class'] == 'minivan', "cty"], color="g", label="minivan", hist_kws={'alpha':.7}, kde_kws={'linewidth':3})plt.ylim(0, 0.35)# Decoration
plt.title('Density Plot of City Mileage by Vehicle Type', fontsize=22)
plt.legend()
plt.show()


24. Joy Plot

Joy Plot允许不同组的密度曲线重叠,这是一种可视化相对于彼此的大量组的分布的好方法。它看起来很悦目,并清楚地传达了正确的信息。它可以使用joypy包来轻松构建。

# !pip install joypy
# Import Data
mpg = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")# Draw Plot
plt.figure(figsize=(16,10), dpi= 80)
fig, axes = joypy.joyplot(mpg, column=['hwy', 'cty'], by="class", ylim='own', figsize=(14,10))# Decoration
plt.title('Joy Plot of City and Highway Mileage by Class', fontsize=22)
plt.show()


25. Distributed Dot Plot(分布式点图)

分布点图显示按组分割的点的单变量分布。

import matplotlib.patches as mpatches# Prepare Data
df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")
cyl_colors = {4:'tab:red', 5:'tab:green', 6:'tab:blue', 8:'tab:orange'}
df_raw['cyl_color'] = df_raw.cyl.map(cyl_colors)# Mean and Median city mileage by make
df = df_raw[['cty', 'manufacturer']].groupby('manufacturer').apply(lambda x: x.mean())
df.sort_values('cty', ascending=False, inplace=True)
df.reset_index(inplace=True)
df_median = df_raw[['cty', 'manufacturer']].groupby('manufacturer').apply(lambda x: x.median())# Draw horizontal lines
fig, ax = plt.subplots(figsize=(16,10), dpi= 80)
ax.hlines(y=df.index, xmin=0, xmax=40, color='gray', alpha=0.5, linewidth=.5, linestyles='dashdot')# Draw the Dots
for i, make in enumerate(df.manufacturer):df_make = df_raw.loc[df_raw.manufacturer==make, :]ax.scatter(y=np.repeat(i, df_make.shape[0]), x='cty', data=df_make, s=75, edgecolors='gray', c='w', alpha=0.5)ax.scatter(y=i, x='cty', data=df_median.loc[df_median.index==make, :], s=75, c='firebrick')# Annotate    
ax.text(33, 13, "$red \; dots \; are \; the \: median$", fontdict={'size':12}, color='firebrick')# Decorations
red_patch = plt.plot([],[], marker="o", ms=10, ls="", mec=None, color='firebrick', label="Median")
plt.legend(handles=red_patch)
ax.set_title('Distribution of City Mileage by Make', fontdict={'size':22})
ax.set_xlabel('Miles Per Gallon (City)', alpha=0.7)
ax.set_yticks(df.index)
ax.set_yticklabels(df.manufacturer.str.title(), fontdict={'horizontalalignment': 'right'}, alpha=0.7)
ax.set_xlim(1, 40)
plt.xticks(alpha=0.7)
plt.gca().spines["top"].set_visible(False)    
plt.gca().spines["bottom"].set_visible(False)    
plt.gca().spines["right"].set_visible(False)    
plt.gca().spines["left"].set_visible(False)   
plt.grid(axis='both', alpha=.4, linewidth=.1)
plt.show()


26. Box Plot(箱形图)

箱形图是一种可视化分布的好方法,记住中位数,第25个第75个四分位数和异常值。

# Import Data
df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")# Draw Plot
plt.figure(figsize=(13,10), dpi= 80)
sns.boxplot(x='class', y='hwy', data=df, notch=False)# Add N Obs inside boxplot (optional)
def add_n_obs(df,group_col,y):medians_dict = {grp[0]:grp[1][y].median() for grp in df.groupby(group_col)}xticklabels = [x.get_text() for x in plt.gca().get_xticklabels()]n_obs = df.groupby(group_col)[y].size().valuesfor (x, xticklabel), n_ob in zip(enumerate(xticklabels), n_obs):plt.text(x, medians_dict[xticklabel]*1.01, "#obs : "+str(n_ob),         horizontalalignment='center', fontdict={'size':14}, color='white')add_n_obs(df,group_col='class',y='hwy')    # Decoration
plt.title('Box Plot of Highway Mileage by Vehicle Class', fontsize=22)
plt.ylim(10, 40)
plt.show()


27. Dot + Box Plot(点+箱型图)

Dot + Box plot传送类似于分组的boxplot信息。此外,这些点给出了每组中有多少数据点。

# Import Data
df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")# Draw Plot
plt.figure(figsize=(13,10), dpi= 80)
sns.boxplot(x='class', y='hwy', data=df, hue='cyl')
sns.stripplot(x='class', y='hwy', data=df, color='black', size=3, jitter=1)for i in range(len(df['class'].unique())-1):plt.vlines(i+.5, 10, 45, linestyles='solid', colors='gray', alpha=0.2)# Decoration
plt.title('Box Plot of Highway Mileage by Vehicle Class', fontsize=22)
plt.legend(title='Cylinders')
plt.show()


28. Violin Plot(小提琴图)

小提琴图是箱形图的视觉上令人愉悦的替代品。小提琴的形状或面积取决于它所持有的点数。然而,小提琴图可能更难以理解,并且在专业设置中不常用。

# Import Data
df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")# Draw Plot
plt.figure(figsize=(13,10), dpi= 80)
sns.violinplot(x='class', y='hwy', data=df, scale='width', inner='quartile')# Decoration
plt.title('Violin Plot of Highway Mileage by Vehicle Class', fontsize=22)
plt.show()


29. Population Pyramid(人口金字塔)

人口金字塔可用于显示由volumne排序的组的分布。

# Read data
df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/email_campaign_funnel.csv")# Draw Plot
plt.figure(figsize=(13,10), dpi= 80)
group_col = 'Gender'
order_of_bars = df.Stage.unique()[::-1]
colors = [plt.cm.Spectral(i/float(len(df[group_col].unique())-1)) for i in range(len(df[group_col].unique()))]for c, group in zip(colors, df[group_col].unique()):sns.barplot(x='Users', y='Stage', data=df.loc[df[group_col]==group, :], order=order_of_bars, color=c, label=group)# Decorations    
plt.xlabel("$Users$")
plt.ylabel("Stage of Purchase")
plt.yticks(fontsize=12)
plt.title("Population Pyramid of the Marketing Funnel", fontsize=22)
plt.legend()
plt.show()


30. Categorical Plots(分类图)

由Seaborn库提供的分类图可用于可视化彼此相关的2个或更多分类变量的计数分布。

# Load Dataset
titanic = sns.load_dataset("titanic")# Plot
g = sns.catplot("alive", col="deck", col_wrap=4,data=titanic[titanic.deck.notnull()],kind="count", height=3.5, aspect=.8, palette='tab20')fig.suptitle('sf')
plt.show()

# Load Dataset
titanic = sns.load_dataset("titanic")# Plot
sns.catplot(x="age", y="embark_town",hue="sex", col="class",data=titanic[titanic.embark_town.notnull()],orient="h", height=5, aspect=1, palette="tab10",kind="violin", dodge=True, cut=0, bw=.2)


Composition

31. Waffle Chart(华夫饼表)

Waffle表可使用pywaffle包来创建。

#! pip install pywaffle
# Reference: https://stackoverflow.com/questions/41400136/how-to-do-waffle-charts-in-python-square-piechart
from pywaffle import Waffle# Import
df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")# Prepare Data
df = df_raw.groupby('class').size().reset_index(name='counts')
n_categories = df.shape[0]
colors = [plt.cm.inferno_r(i/float(n_categories)) for i in range(n_categories)]# Draw Plot and Decorate
fig = plt.figure(FigureClass=Waffle,plots={'111': {'values': df['counts'],'labels': ["{0} ({1})".format(n[0], n[1]) for n in df[['class', 'counts']].itertuples()],'legend': {'loc': 'upper left', 'bbox_to_anchor': (1.05, 1), 'fontsize': 12},'title': {'label': '# Vehicles by Class', 'loc': 'center', 'fontsize':18}},},rows=7,colors=colors,figsize=(16, 9)
)

#! pip install pywaffle
from pywaffle import Waffle# Import
# df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")# Prepare Data
# By Class Data
df_class = df_raw.groupby('class').size().reset_index(name='counts_class')
n_categories = df_class.shape[0]
colors_class = [plt.cm.Set3(i/float(n_categories)) for i in range(n_categories)]# By Cylinders Data
df_cyl = df_raw.groupby('cyl').size().reset_index(name='counts_cyl')
n_categories = df_cyl.shape[0]
colors_cyl = [plt.cm.Spectral(i/float(n_categories)) for i in range(n_categories)]# By Make Data
df_make = df_raw.groupby('manufacturer').size().reset_index(name='counts_make')
n_categories = df_make.shape[0]
colors_make = [plt.cm.tab20b(i/float(n_categories)) for i in range(n_categories)]# Draw Plot and Decorate
fig = plt.figure(FigureClass=Waffle,plots={'311': {'values': df_class['counts_class'],'labels': ["{1}".format(n[0], n[1]) for n in df_class[['class', 'counts_class']].itertuples()],'legend': {'loc': 'upper left', 'bbox_to_anchor': (1.05, 1), 'fontsize': 12, 'title':'Class'},'title': {'label': '# Vehicles by Class', 'loc': 'center', 'fontsize':18},'colors': colors_class},'312': {'values': df_cyl['counts_cyl'],'labels': ["{1}".format(n[0], n[1]) for n in df_cyl[['cyl', 'counts_cyl']].itertuples()],'legend': {'loc': 'upper left', 'bbox_to_anchor': (1.05, 1), 'fontsize': 12, 'title':'Cyl'},'title': {'label': '# Vehicles by Cyl', 'loc': 'center', 'fontsize':18},'colors': colors_cyl},'313': {'values': df_make['counts_make'],'labels': ["{1}".format(n[0], n[1]) for n in df_make[['manufacturer', 'counts_make']].itertuples()],'legend': {'loc': 'upper left', 'bbox_to_anchor': (1.05, 1), 'fontsize': 12, 'title':'Manufacturer'},'title': {'label': '# Vehicles by Make', 'loc': 'center', 'fontsize':18},'colors': colors_make}},rows=9,figsize=(16, 14)
)


32. Pie Chart(饼状图)

饼状图大家应该很熟悉,这里只有一个小建议:明确标记饼状图每个部分的百分比或数字。

# Import
df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")# Prepare Data
df = df_raw.groupby('class').size()# Make the plot with pandas
df.plot(kind='pie', subplots=True, figsize=(8, 8), dpi= 80)
plt.title("Pie Chart of Vehicle Class - Bad")
plt.ylabel("")
plt.show()

# Import
df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")# Prepare Data
df = df_raw.groupby('class').size().reset_index(name='counts')# Draw Plot
fig, ax = plt.subplots(figsize=(12, 7), subplot_kw=dict(aspect="equal"), dpi= 80)data = df['counts']
categories = df['class']
explode = [0,0,0,0,0,0.1,0]def func(pct, allvals):absolute = int(pct/100.*np.sum(allvals))return "{:.1f}% ({:d} )".format(pct, absolute)wedges, texts, autotexts = ax.pie(data, autopct=lambda pct: func(pct, data),textprops=dict(color="w"), colors=plt.cm.Dark2.colors,startangle=140,explode=explode)# Decoration
ax.legend(wedges, categories, title="Vehicle Class", loc="center left", bbox_to_anchor=(1, 0, 0.5, 1))
plt.setp(autotexts, size=10, weight=700)
ax.set_title("Class of Vehicles: Pie Chart")
plt.show()


33. Treemap(树状图)

# pip install squarify
import squarify # Import Data
df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")# Prepare Data
df = df_raw.groupby('class').size().reset_index(name='counts')
labels = df.apply(lambda x: str(x[0]) + "\n (" + str(x[1]) + ")", axis=1)
sizes = df['counts'].values.tolist()
colors = [plt.cm.Spectral(i/float(len(labels))) for i in range(len(labels))]# Draw Plot
plt.figure(figsize=(12,8), dpi= 80)
squarify.plot(sizes=sizes, label=labels, color=colors, alpha=.8)# Decorate
plt.title('Treemap of Vechile Class')
plt.axis('off')
plt.show()


34. Bar Chart(条形图)

条形图是根据计数或任何给定指标而可视化条目。在下面的图表中,为每个条目使用了不同的颜色,但通常可能希望为所有条目选择一种颜色,除非按组对它们进行着色。颜色名称存储在all_colors下面的代码中。可以通过设置color参数来更改条形的颜色。

import random# Import Data
df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")# Prepare Data
df = df_raw.groupby('manufacturer').size().reset_index(name='counts')
n = df['manufacturer'].unique().__len__()+1
all_colors = list(plt.cm.colors.cnames.keys())
random.seed(100)
c = random.choices(all_colors, k=n)# Plot Bars
plt.figure(figsize=(16,10), dpi= 80)
plt.bar(df['manufacturer'], df['counts'], color=c, width=.5)
for i, val in enumerate(df['counts'].values):plt.text(i, val, float(val), horizontalalignment='center',verticalalignment='bottom', fontdict={'fontweight':500, 'size':12})# Decoration
plt.gca().set_xticklabels(df['manufacturer'], rotation=60, horizontalalignment= 'right')
plt.title("Number of Vehicles by Manaufacturers", fontsize=22)
plt.ylabel('# Vehicles')
plt.ylim(0, 45)
plt.show()


Change

35. Time Series Plot(时间序列图)

时间序列图用于显示给定度量随时间变化的方式。在这里,可以看到1949年至1969年间航空客运量的变化情况。

# Import Data
df = pd.read_csv('https://github.com/selva86/datasets/raw/master/AirPassengers.csv')# Draw Plot
plt.figure(figsize=(16,10), dpi= 80)
plt.plot('date', 'traffic', data=df, color='tab:red')# Decoration
plt.ylim(50, 750)
xtick_location = df.index.tolist()[::12]
xtick_labels = [x[-4:] for x in df.date.tolist()[::12]]
plt.xticks(ticks=xtick_location, labels=xtick_labels, rotation=0, fontsize=12, horizontalalignment='center', alpha=.7)
plt.yticks(fontsize=12, alpha=.7)
plt.title("Air Passengers Traffic (1949 - 1969)", fontsize=22)
plt.grid(axis='both', alpha=.3)# Remove borders
plt.gca().spines["top"].set_alpha(0.0)    
plt.gca().spines["bottom"].set_alpha(0.3)
plt.gca().spines["right"].set_alpha(0.0)    
plt.gca().spines["left"].set_alpha(0.3)   
plt.show()


36. Time Series with Peaks and Troughs Annotated(带波峰波谷标记的时序图

下面的时间序列绘制了所有的波峰和波谷,并注释了所选特殊事件的发生。

# Import Data
df = pd.read_csv('https://github.com/selva86/datasets/raw/master/AirPassengers.csv')# Get the Peaks and Troughs
data = df['traffic'].values
doublediff = np.diff(np.sign(np.diff(data)))
peak_locations = np.where(doublediff == -2)[0] + 1doublediff2 = np.diff(np.sign(np.diff(-1*data)))
trough_locations = np.where(doublediff2 == -2)[0] + 1# Draw Plot
plt.figure(figsize=(16,10), dpi= 80)
plt.plot('date', 'traffic', data=df, color='tab:blue', label='Air Traffic')
plt.scatter(df.date[peak_locations], df.traffic[peak_locations], marker=mpl.markers.CARETUPBASE, color='tab:green', s=100, label='Peaks')
plt.scatter(df.date[trough_locations], df.traffic[trough_locations], marker=mpl.markers.CARETDOWNBASE, color='tab:red', s=100, label='Troughs')# Annotate
for t, p in zip(trough_locations[1::5], peak_locations[::3]):plt.text(df.date[p], df.traffic[p]+15, df.date[p], horizontalalignment='center', color='darkgreen')plt.text(df.date[t], df.traffic[t]-35, df.date[t], horizontalalignment='center', color='darkred')# Decoration
plt.ylim(50,750)
xtick_location = df.index.tolist()[::6]
xtick_labels = df.date.tolist()[::6]
plt.xticks(ticks=xtick_location, labels=xtick_labels, rotation=90, fontsize=12, alpha=.7)
plt.title("Peak and Troughs of Air Passengers Traffic (1949 - 1969)", fontsize=22)
plt.yticks(fontsize=12, alpha=.7)# Lighten borders
plt.gca().spines["top"].set_alpha(.0)
plt.gca().spines["bottom"].set_alpha(.3)
plt.gca().spines["right"].set_alpha(.0)
plt.gca().spines["left"].set_alpha(.3)plt.legend(loc='upper left')
plt.grid(axis='y', alpha=.3)
plt.show()


37. Autocorrelation (ACF) and Partial Autocorrelation (PACF) Plot(自相关和部分自相关图

ACF图显示时间序列与其自身滞后的相关性。

PACF在另一方面显示了任何给定滞后(时间序列)与当前序列的自相关,但是删除了滞后的贡献。

from statsmodels.graphics.tsaplots import plot_acf, plot_pacf# Import Data
df = pd.read_csv('https://github.com/selva86/datasets/raw/master/AirPassengers.csv')# Draw Plot
fig, (ax1, ax2) = plt.subplots(1, 2,figsize=(16,6), dpi= 80)
plot_acf(df.traffic.tolist(), ax=ax1, lags=50)
plot_pacf(df.traffic.tolist(), ax=ax2, lags=20)# Decorate
# lighten the borders
ax1.spines["top"].set_alpha(.3); ax2.spines["top"].set_alpha(.3)
ax1.spines["bottom"].set_alpha(.3); ax2.spines["bottom"].set_alpha(.3)
ax1.spines["right"].set_alpha(.3); ax2.spines["right"].set_alpha(.3)
ax1.spines["left"].set_alpha(.3); ax2.spines["left"].set_alpha(.3)# font size of tick labels
ax1.tick_params(axis='both', labelsize=12)
ax2.tick_params(axis='both', labelsize=12)
plt.show()


38. Cross Correlation plot(交叉相关图)

互相关图显示了两个时间序列相互之间的滞后。

import statsmodels.tsa.stattools as stattools# Import Data
df = pd.read_csv('https://github.com/selva86/datasets/raw/master/mortality.csv')
x = df['mdeaths']
y = df['fdeaths']# Compute Cross Correlations
ccs = stattools.ccf(x, y)[:100]
nlags = len(ccs)# Compute the Significance level
# ref: https://stats.stackexchange.com/questions/3115/cross-correlation-significance-in-r/3128#3128
conf_level = 2 / np.sqrt(nlags)# Draw Plot
plt.figure(figsize=(12,7), dpi= 80)plt.hlines(0, xmin=0, xmax=100, color='gray')  # 0 axis
plt.hlines(conf_level, xmin=0, xmax=100, color='gray')
plt.hlines(-conf_level, xmin=0, xmax=100, color='gray')plt.bar(x=np.arange(len(ccs)), height=ccs, width=.3)# Decoration
plt.title('$Cross\; Correlation\; Plot:\; mdeaths\; vs\; fdeaths$', fontsize=22)
plt.xlim(0,len(ccs))
plt.show()


39. Time Series Decomposition Plot(时间序列分解图)

from statsmodels.tsa.seasonal import seasonal_decompose
from dateutil.parser import parse# Import Data
df = pd.read_csv('https://github.com/selva86/datasets/raw/master/AirPassengers.csv')
dates = pd.DatetimeIndex([parse(d).strftime('%Y-%m-01') for d in df['date']])
df.set_index(dates, inplace=True)# Decompose 
result = seasonal_decompose(df['traffic'], model='multiplicative')# Plot
plt.rcParams.update({'figure.figsize': (10,10)})
result.plot().suptitle('Time Series Decomposition of Air Passengers')
plt.show()


40. Multiple Time Series(多时间序列)

可以绘制多个时间序列,如下所示。

# Import Data
df = pd.read_csv('https://github.com/selva86/datasets/raw/master/mortality.csv')# Define the upper limit, lower limit, interval of Y axis and colors
y_LL = 100
y_UL = int(df.iloc[:, 1:].max().max()*1.1)
y_interval = 400
mycolors = ['tab:red', 'tab:blue', 'tab:green', 'tab:orange']    # Draw Plot and Annotate
fig, ax = plt.subplots(1,1,figsize=(16, 9), dpi= 80)    columns = df.columns[1:]  
for i, column in enumerate(columns):    plt.plot(df.date.values, df[column].values, lw=1.5, color=mycolors[i])    plt.text(df.shape[0]+1, df[column].values[-1], column, fontsize=14, color=mycolors[i])# Draw Tick lines  
for y in range(y_LL, y_UL, y_interval):    plt.hlines(y, xmin=0, xmax=71, colors='black', alpha=0.3, linestyles="--", lw=0.5)# Decorations    
plt.tick_params(axis="both", which="both", bottom=False, top=False,    labelbottom=True, left=False, right=False, labelleft=True)        # Lighten borders
plt.gca().spines["top"].set_alpha(.3)
plt.gca().spines["bottom"].set_alpha(.3)
plt.gca().spines["right"].set_alpha(.3)
plt.gca().spines["left"].set_alpha(.3)plt.title('Number of Deaths from Lung Diseases in the UK (1974-1979)', fontsize=22)
plt.yticks(range(y_LL, y_UL, y_interval), [str(y) for y in range(y_LL, y_UL, y_interval)], fontsize=12)    
plt.xticks(range(0, df.shape[0], 12), df.date.values[::12], horizontalalignment='left', fontsize=12)    
plt.ylim(y_LL, y_UL)    
plt.xlim(-2, 80)    
plt.show()


41. Plotting with different scales using secondary Y axis(使用辅助Y轴来绘制不同范围的图形

如果要显示在同一时间点测量两个不同数量的两个时间序列,则可以在右侧的辅助Y轴上再绘制第二个系列。

# Import Data
df = pd.read_csv("https://github.com/selva86/datasets/raw/master/economics.csv")x = df['date']
y1 = df['psavert']
y2 = df['unemploy']# Plot Line1 (Left Y Axis)
fig, ax1 = plt.subplots(1,1,figsize=(16,9), dpi= 80)
ax1.plot(x, y1, color='tab:red')# Plot Line2 (Right Y Axis)
ax2 = ax1.twinx()  # instantiate a second axes that shares the same x-axis
ax2.plot(x, y2, color='tab:blue')# Decorations
# ax1 (left Y axis)
ax1.set_xlabel('Year', fontsize=20)
ax1.tick_params(axis='x', rotation=0, labelsize=12)
ax1.set_ylabel('Personal Savings Rate', color='tab:red', fontsize=20)
ax1.tick_params(axis='y', rotation=0, labelcolor='tab:red' )
ax1.grid(alpha=.4)# ax2 (right Y axis)
ax2.set_ylabel("# Unemployed (1000's)", color='tab:blue', fontsize=20)
ax2.tick_params(axis='y', labelcolor='tab:blue')
ax2.set_xticks(np.arange(0, len(x), 60))
ax2.set_xticklabels(x[::60], rotation=90, fontdict={'fontsize':10})
ax2.set_title("Personal Savings Rate vs Unemployed: Plotting in Secondary Y Axis", fontsize=22)
fig.tight_layout()
plt.show()


42. Time Series with Error Bands(带有误差带的时间序列)

from scipy.stats import sem# Import Data
df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/user_orders_hourofday.csv")
df_mean = df.groupby('order_hour_of_day').quantity.mean()
df_se = df.groupby('order_hour_of_day').quantity.apply(sem).mul(1.96)# Plot
plt.figure(figsize=(16,10), dpi= 80)
plt.ylabel("# Orders", fontsize=16)  
x = df_mean.index
plt.plot(x, df_mean, color="white", lw=2) 
plt.fill_between(x, df_mean - df_se, df_mean + df_se, color="#3F5D7D")  # Decorations
# Lighten borders
plt.gca().spines["top"].set_alpha(0)
plt.gca().spines["bottom"].set_alpha(1)
plt.gca().spines["right"].set_alpha(0)
plt.gca().spines["left"].set_alpha(1)
plt.xticks(x[::2], [str(d) for d in x[::2]] , fontsize=12)
plt.title("User Orders by Hour of Day (95% confidence)", fontsize=22)
plt.xlabel("Hour of Day")s, e = plt.gca().get_xlim()
plt.xlim(s, e)# Draw Horizontal Tick lines  
for y in range(8, 20, 2):    plt.hlines(y, xmin=s, xmax=e, colors='black', alpha=0.5, linestyles="--", lw=0.5)plt.show()

"Data Source: https://www.kaggle.com/olistbr/brazilian-ecommerce#olist_orders_dataset.csv"
from dateutil.parser import parse
from scipy.stats import sem# Import Data
df_raw = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/orders_45d.csv', parse_dates=['purchase_time', 'purchase_date'])# Prepare Data: Daily Mean and SE Bands
df_mean = df_raw.groupby('purchase_date').quantity.mean()
df_se = df_raw.groupby('purchase_date').quantity.apply(sem).mul(1.96)# Plot
plt.figure(figsize=(16,10), dpi= 80)
plt.ylabel("# Daily Orders", fontsize=16)  
x = [d.date().strftime('%Y-%m-%d') for d in df_mean.index]
plt.plot(x, df_mean, color="white", lw=2) 
plt.fill_between(x, df_mean - df_se, df_mean + df_se, color="#3F5D7D")  # Decorations
# Lighten borders
plt.gca().spines["top"].set_alpha(0)
plt.gca().spines["bottom"].set_alpha(1)
plt.gca().spines["right"].set_alpha(0)
plt.gca().spines["left"].set_alpha(1)
plt.xticks(x[::6], [str(d) for d in x[::6]] , fontsize=12)
plt.title("Daily Order Quantity of Brazilian Retail with Error Bands (95% confidence)", fontsize=20)# Axis limits
s, e = plt.gca().get_xlim()
plt.xlim(s, e-2)
plt.ylim(4, 10)# Draw Horizontal Tick lines  
for y in range(5, 10, 1):    plt.hlines(y, xmin=s, xmax=e, colors='black', alpha=0.5, linestyles="--", lw=0.5)plt.show()


43. Stacked Area Chart(堆积面积图

堆积面积图可以直观地显示多个时间序列的贡献程度,因此可以轻松地相互比较。

# Import Data
df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/nightvisitors.csv')# Decide Colors 
mycolors = ['tab:red', 'tab:blue', 'tab:green', 'tab:orange', 'tab:brown', 'tab:grey', 'tab:pink', 'tab:olive']      # Draw Plot and Annotate
fig, ax = plt.subplots(1,1,figsize=(16, 9), dpi= 80)
columns = df.columns[1:]
labs = columns.values.tolist()# Prepare data
x  = df['yearmon'].values.tolist()
y0 = df[columns[0]].values.tolist()
y1 = df[columns[1]].values.tolist()
y2 = df[columns[2]].values.tolist()
y3 = df[columns[3]].values.tolist()
y4 = df[columns[4]].values.tolist()
y5 = df[columns[5]].values.tolist()
y6 = df[columns[6]].values.tolist()
y7 = df[columns[7]].values.tolist()
y = np.vstack([y0, y2, y4, y6, y7, y5, y1, y3])# Plot for each column
labs = columns.values.tolist()
ax = plt.gca()
ax.stackplot(x, y, labels=labs, colors=mycolors, alpha=0.8)# Decorations
ax.set_title('Night Visitors in Australian Regions', fontsize=18)
ax.set(ylim=[0, 100000])
ax.legend(fontsize=10, ncol=4)
plt.xticks(x[::5], fontsize=10, horizontalalignment='center')
plt.yticks(np.arange(10000, 100000, 20000), fontsize=10)
plt.xlim(x[0], x[-1])# Lighten borders
plt.gca().spines["top"].set_alpha(0)
plt.gca().spines["bottom"].set_alpha(.3)
plt.gca().spines["right"].set_alpha(0)
plt.gca().spines["left"].set_alpha(.3)plt.show()


44. Area Chart UnStacked(未堆积的面积图

未堆积区域图用于可视化两个或更多个系列相对于彼此的进度(起伏)。在下面的图表中,您可以清楚地看到随着失业中位数持续时间的增加,个人储蓄率会下降。未堆积的区域图表很好地揭示了这种现象。

# Import Data
df = pd.read_csv("https://github.com/selva86/datasets/raw/master/economics.csv")# Prepare Data
x = df['date'].values.tolist()
y1 = df['psavert'].values.tolist()
y2 = df['uempmed'].values.tolist()
mycolors = ['tab:red', 'tab:blue', 'tab:green', 'tab:orange', 'tab:brown', 'tab:grey', 'tab:pink', 'tab:olive']      
columns = ['psavert', 'uempmed']# Draw Plot 
fig, ax = plt.subplots(1, 1, figsize=(16,9), dpi= 80)
ax.fill_between(x, y1=y1, y2=0, label=columns[1], alpha=0.5, color=mycolors[1], linewidth=2)
ax.fill_between(x, y1=y2, y2=0, label=columns[0], alpha=0.5, color=mycolors[0], linewidth=2)# Decorations
ax.set_title('Personal Savings Rate vs Median Duration of Unemployment', fontsize=18)
ax.set(ylim=[0, 30])
ax.legend(loc='best', fontsize=12)
plt.xticks(x[::50], fontsize=10, horizontalalignment='center')
plt.yticks(np.arange(2.5, 30.0, 2.5), fontsize=10)
plt.xlim(-10, x[-1])# Draw Tick lines  
for y in np.arange(2.5, 30.0, 2.5):    plt.hlines(y, xmin=0, xmax=len(x), colors='black', alpha=0.3, linestyles="--", lw=0.5)# Lighten borders
plt.gca().spines["top"].set_alpha(0)
plt.gca().spines["bottom"].set_alpha(.3)
plt.gca().spines["right"].set_alpha(0)
plt.gca().spines["left"].set_alpha(.3)
plt.show()


45. Calendar Heat Map(日历热力图

与时间序列图相比,日历热力图是基于时间的数据可视化的备选项。虽然可以在视觉上吸引人,但数值并不十分明显。然而,它可以很好地描绘极端值和假日效果。

import matplotlib as mpl
import calmap# Import Data
df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/yahoo.csv", parse_dates=['date'])
df.set_index('date', inplace=True)# Plot
plt.figure(figsize=(16,10), dpi= 80)
calmap.calendarplot(df['2014']['VIX.Close'], fig_kws={'figsize': (16,10)}, yearlabel_kws={'color':'black', 'fontsize':14}, subplot_kws={'title':'Yahoo Stock Prices'})
plt.show()


46. Seasonal Plot(季度图)

from dateutil.parser import parse # Import Data
df = pd.read_csv('https://github.com/selva86/datasets/raw/master/AirPassengers.csv')# Prepare data
df['year'] = [parse(d).year for d in df.date]
df['month'] = [parse(d).strftime('%b') for d in df.date]
years = df['year'].unique()# Draw Plot
mycolors = ['tab:red', 'tab:blue', 'tab:green', 'tab:orange', 'tab:brown', 'tab:grey', 'tab:pink', 'tab:olive', 'deeppink', 'steelblue', 'firebrick', 'mediumseagreen']      
plt.figure(figsize=(16,10), dpi= 80)for i, y in enumerate(years):plt.plot('month', 'traffic', data=df.loc[df.year==y, :], color=mycolors[i], label=y)plt.text(df.loc[df.year==y, :].shape[0]-.9,df.loc[df.year==y, 'traffic'][-1:].values[0], y, fontsize=12, color=mycolors[i])# Decoration
plt.ylim(50,750)
plt.xlim(-0.3, 11)
plt.ylabel('$Air Traffic$')
plt.yticks(fontsize=12, alpha=.7)
plt.title("Monthly Seasonal Plot: Air Passengers Traffic (1949 - 1969)", fontsize=22)
plt.grid(axis='y', alpha=.3)# Remove borders
plt.gca().spines["top"].set_alpha(0.0)    
plt.gca().spines["bottom"].set_alpha(0.5)
plt.gca().spines["right"].set_alpha(0.0)    
plt.gca().spines["left"].set_alpha(0.5)   
# plt.legend(loc='upper right', ncol=2, fontsize=12)
plt.show()


Groups

47. Dendrogram(树状图)

树形图基于给定的距离度量将相似的点组合在一起,并基于点的相似性将它们组织在树状链接中。

import scipy.cluster.hierarchy as shc# Import Data
df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/USArrests.csv')# Plot
plt.figure(figsize=(16, 10), dpi= 80)  
plt.title("USArrests Dendograms", fontsize=22)  
dend = shc.dendrogram(shc.linkage(df[['Murder', 'Assault', 'UrbanPop', 'Rape']], method='ward'), labels=df.State.values, color_threshold=100)  
plt.xticks(fontsize=12)
plt.show()


48. Cluster Plot(簇状图)

Cluster Plot可用于划分属于同一群集的点。下面是根据USArrests数据集将美国各州分为5组的代表性示例。该集群图使用“谋杀”和“攻击”列作为X和Y轴。或者,您可以将第一个到主要组件用作X轴和Y轴。

from sklearn.cluster import AgglomerativeClustering
from scipy.spatial import ConvexHull# Import Data
df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/USArrests.csv')# Agglomerative Clustering
cluster = AgglomerativeClustering(n_clusters=5, affinity='euclidean', linkage='ward')  
cluster.fit_predict(df[['Murder', 'Assault', 'UrbanPop', 'Rape']])  # Plot
plt.figure(figsize=(14, 10), dpi= 80)  
plt.scatter(df.iloc[:,0], df.iloc[:,1], c=cluster.labels_, cmap='tab10')  # Encircle
def encircle(x,y, ax=None, **kw):if not ax: ax=plt.gca()p = np.c_[x,y]hull = ConvexHull(p)poly = plt.Polygon(p[hull.vertices,:], **kw)ax.add_patch(poly)# Draw polygon surrounding vertices    
encircle(df.loc[cluster.labels_ == 0, 'Murder'], df.loc[cluster.labels_ == 0, 'Assault'], ec="k",fc="gold", alpha=0.2, linewidth=0)
encircle(df.loc[cluster.labels_ == 1, 'Murder'], df.loc[cluster.labels_ == 1, 'Assault'], ec="k", fc="tab:blue", alpha=0.2, linewidth=0)
encircle(df.loc[cluster.labels_ == 2, 'Murder'], df.loc[cluster.labels_ == 2, 'Assault'], ec="k", fc="tab:red", alpha=0.2, linewidth=0)
encircle(df.loc[cluster.labels_ == 3, 'Murder'], df.loc[cluster.labels_ == 3, 'Assault'], ec="k", fc="tab:green", alpha=0.2, linewidth=0)
encircle(df.loc[cluster.labels_ == 4, 'Murder'], df.loc[cluster.labels_ == 4, 'Assault'], ec="k", fc="tab:orange", alpha=0.2, linewidth=0)# Decorations
plt.xlabel('Murder'); plt.xticks(fontsize=12)
plt.ylabel('Assault'); plt.yticks(fontsize=12)
plt.title('Agglomerative Clustering of USArrests (5 Groups)', fontsize=22)
plt.show()


49. Andrews Curve(安德鲁斯曲线

安德鲁斯曲线有助于可视化是否存在基于给定分组的数字特征的固有分组。如果功能(数据集中的列)无法区分组(cyl)那么线条将不会很好地隔离,如下所示。

from pandas.plotting import andrews_curves# Import
df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mtcars.csv")
df.drop(['cars', 'carname'], axis=1, inplace=True)# Plot
plt.figure(figsize=(12,9), dpi= 80)
andrews_curves(df, 'cyl', colormap='Set1')# Lighten borders
plt.gca().spines["top"].set_alpha(0)
plt.gca().spines["bottom"].set_alpha(.3)
plt.gca().spines["right"].set_alpha(0)
plt.gca().spines["left"].set_alpha(.3)plt.title('Andrews Curves of mtcars', fontsize=22)
plt.xlim(-3,3)
plt.grid(alpha=0.3)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.show()


50. Parallel Coordinates(平行坐标

平行坐标有助于可视化特征是否有助于有效地隔离组。如果实现隔离,则该特征可能在预测该组时非常有用。

from pandas.plotting import parallel_coordinates# Import Data
df_final = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/diamonds_filter.csv")# Plot
plt.figure(figsize=(12,9), dpi= 80)
parallel_coordinates(df_final, 'cut', colormap='Dark2')# Lighten borders
plt.gca().spines["top"].set_alpha(0)
plt.gca().spines["bottom"].set_alpha(.3)
plt.gca().spines["right"].set_alpha(0)
plt.gca().spines["left"].set_alpha(.3)plt.title('Parallel Coordinated of Diamonds', fontsize=22)
plt.grid(alpha=0.3)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.show()

 

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/439714.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

视觉SLAM十四讲(1):预备知识

最近在学习高翔博士的《视觉SLAM十四讲》&#xff08;第二版&#xff09;&#xff0c;算是初学本书&#xff0c;配套资源还算蛮丰富的&#xff0c;有代码&#xff08;第一版和第二版都有&#xff09;&#xff0c;B站上也有高翔博士对第一版录制的讲解视频&#xff0c;真的是很贴…

一步步编写操作系统 43 汇编语言和c语言的理解

也许有的同学喜欢用汇编语言来实现操作系统&#xff0c;觉得用汇编来写程序似乎更简单直接&#xff0c;可控性比较强&#xff0c;有种“一切尽在掌握”的赶脚。而用c语言实现操作系统这件事&#xff0c;虽然轻松很多&#xff0c;但似乎隐约感觉到有些慌张。因为虽然c语言相对来…

视觉SLAM十四讲(2):初识SLAM

这一讲主要介绍视觉SLAM的结构&#xff0c;并完成第一个SLAM程序&#xff1a;HelloSLAM。 目录 2.1 小萝卜的例子 单目相机 双目相机 深度相机 2.2 经典视觉SLAM框架 2.3 SLAM问题的数学表述 2.4 编程实践 Hello SLAM 使用cmake 使用库 【高翔】视觉SLAM十四讲2.1 小…

一步步编写操作系统 44 用c语言编写内核1

先来个简单的&#xff0c;欢迎我们神秘嘉宾——main.c。这是我们第一个c语言代码。 1 int main(void) { 2 while(1); 3 return 0; 4 }它没法再简单啦&#xff0c;简单的程序似乎能帮助咱们更容易的理解所学的知识&#xff0c;哈哈&#xff0c;我说的是似乎&#xff0c;其实&…

从零实现一个3D目标检测算法(1):3D目标检测概述

本文是根据github上的开源项目&#xff1a;https://github.com/open-mmlab/OpenPCDet整理而来&#xff0c;在此表示感谢&#xff0c;强烈推荐大家去关注。使用的预训练模型也为此项目中提供的模型&#xff0c;不过此项目已更新为v0.2版&#xff0c;与本文中代码略有不同。 本文…

一步步编写操作系统 45 用c语言编写内核2

在linux下用于链接的程序是ld&#xff0c;链接有一个好处&#xff0c;可以指定最终生成的可执行文件的起始虚拟地址。它是用-Ttext参数来指定的&#xff0c;所以咱们可以执行以下命令完成链接&#xff1a; ld kernel/main.o -Ttext 0xc0001500 -e main -o kernel/kernel.bin …

使用OpenCV库快速求解相机内参

本文主要介绍如何使用OpenCV库函数求解相机内参。具体可查阅官网&#xff1a;https://docs.opencv.org/master/dc/dbb/tutorial_py_calibration.html。 关于相机内参的求解还有很多其它的工具&#xff0c;如使用MATLAB求解会更方便&#xff0c;直接调用MATLAB中的APP即可。 1.背…

一步步编写操作系统 46 用c语言编写内核3

再把上节代码贴出来&#xff0c; 1 //int main(void) { 2 int _start(void) { 3 while(1); 4 return 0; 5 }有没有同学想过&#xff0c;这里写一个_start函数&#xff0c;让其调用main函数如何&#xff1f;其实这是可以的&#xff0c;main函数并不是第一个函数&#xff0c;它实…

一步步编写操作系统 47 48 二进制程序运行方式

操作系统并不是在功能上给予用户的支持&#xff0c;这种支持是体现在机制上。也就是说&#xff0c;单纯的操作系统&#xff0c;用户拿它什么都做不了&#xff0c;用户需要的是某种功能。而操作系统仅仅是个提供支持的平台。 虽然我们是模仿linux来写一个黑屏白字的系统&#x…

百度顶会论文复现(1):课程概述

最近百度推出了一款重磅课程《全球顶会论文作者&#xff0c;28天免费手把手带你复现顶会论文》。这个课程真的是很硬核的课程&#xff0c;这里简单记录下自己的学习过程。 文章目录1. 课程设计思路和安排2. 课程大纲1. 课程设计思路和安排 课程设计思路如下&#xff0c;共分为…

百度顶会论文复现(2):GAN综述

本节课主要是对GAN的发展进行了介绍&#xff0c;包括基本原理&#xff0c;训练方法&#xff0c;存在问题&#xff0c;改进以及应用场景等。实践作业则为手写数字生成。课程地址为&#xff1a;https://aistudio.baidu.com/aistudio/education/preview/493290。 文章目录1.什么是…

一步步编写操作系统 48 二进制程序的加载方式

接上节&#xff0c;程序头可以自定义&#xff0c;只要我们按照自己定义的格式去解析就行。也许我光这么一说&#xff0c;很多同学还是不能彻底明白如何自定义文件头&#xff0c;因为大多数同学都是用高级语言来写程序&#xff0c;即使用了偏底层的c语言&#xff0c;不同平台的c…

百度顶会论文复现(3):视频分类综述

本节课主要是对视频分类的发展进行了介绍&#xff0c;包括任务与背景&#xff0c;分类方法&#xff0c;前沿进展等。课程地址为&#xff1a;https://aistudio.baidu.com/aistudio/course/introduce/1340?directly1&shared1。 文章目录1. 任务与背景2. 视频分类方法2.1 双流…

一步步编写操作系统 46 linux的elf可执行文件格式1

ELF文件格式依然是分为文件头和文件体两部分&#xff0c;只是该文件头相对稍显复杂&#xff0c;类似层次化结构&#xff0c;先用个ELF header从“全局上”给出程序文件的组织结构&#xff0c;概要出程序中其它头表的位置大小等信息&#xff0c;如程序头表的大小及位置、节头表的…

百度顶会论文复现(4):飞桨API详解

本节课主要是对飞桨常用API进行了介绍&#xff0c;课程地址为&#xff1a;https://aistudio.baidu.com/aistudio/education/group/info/1340。 文章目录1.飞桨API官网2. API使用介绍3. 飞桨模型操作1.飞桨API官网 官网地址为&#xff1a;https://www.paddlepaddle.org.cn/docu…

一步步编写操作系统 47 elf格式文件分析实验

在上一节中&#xff0c;我们讲述了elf格式的部分理论知识&#xff0c;为什么是部分呢&#xff1f;因为我们本着“够用”的原则&#xff0c;只把我们需要了解的部分说完啦。不过&#xff0c;我相信大部分同学仅仅凭上一节中的理论知识还是领悟不到elf本质&#xff0c;咱们在本节…

百度飞桨顶会论文复现(5):视频分类论文之《Representation Flow for Action Recognition》篇

这次老师在课上总共领读了4篇分类论文&#xff0c;我这里分享其中的一篇论文&#xff0c;是关于使用神经网络对光流进行学习。 课程地址是&#xff1a;https://aistudio.baidu.com/aistudio/education/group/info/1340。 论文地址是&#xff1a;https://arxiv.org/abs/1810.014…

智能算法(GA、DBO等)求解零等待流水车间调度问题(NWFSP)

先做一个声明&#xff1a;文章是由我的个人公众号中的推送直接复制粘贴而来&#xff0c;因此对智能优化算法感兴趣的朋友&#xff0c;可关注我的个人公众号&#xff1a;启发式算法讨论。我会不定期在公众号里分享不同的智能优化算法&#xff0c;经典的&#xff0c;或者是近几年…

一步步编写操作系统 50 加载内核3

接上节&#xff0c;在这里&#xff0c;我们把参数放到了栈中保存&#xff0c;大家注意到了&#xff0c;参数入栈的顺序是先从最右边的开始&#xff0c;最后压入的参数最左边的&#xff0c;其实这是某种约定&#xff0c;要不&#xff0c;为什么不先把中间的参数src入栈呢。既然主…

动手学无人驾驶(5):多传感器数据融合

本系列的前4篇文章主要介绍了深度学习技术在无人驾驶环境感知中的应用&#xff0c;包括交通标志识别&#xff0c;图像与点云3D目标检测。关于环境感知部分的介绍本系列暂且告一段落&#xff0c;后续如有需要再进行补充。 现在我们开启新的篇章&#xff0c;在本文中将会介绍无人…