数据分析案例一使用Python进行红酒与白酒数据数据分析

源码和数据集链接

以红葡萄酒为例

有两个样本:
winequality-red.csv:红葡萄酒样本
winequality-white.csv:白葡萄酒样本
每个样本都有得分从1到10的质量评分,以及若干理化检验的结果

#理化性质字段名称
1固定酸度fixed acidity
2挥发性酸度volatile acidity
3柠檬酸citric acid
4残糖residual sugar
5氯化物chlorides
6游离二氧化硫free sulfur dioxide
7总二氧化硫total sulfur dioxide
8密度density
9PH值pH
10硫酸盐sulphates
11酒精度alcohol
12质量quality

导入数据和库依赖

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# import seaborn as sns
%matplotlib inline
plt.style.use('ggplot')
# sep参数默认逗号
red_df = pd.read_csv('winequality-red.csv', sep=';')
white_df = pd.read_csv('winequality-white.csv', sep=';')
# 查看表头
red_df.head()
fixed_acidityvolatile_aciditycitric_acidresidual_sugarchloridesfree_sulfur_dioxidetotal_sulfur-dioxidedensitypHsulphatesalcoholquality
07.40.700.001.90.07611.034.00.99783.510.569.45
17.80.880.002.60.09825.067.00.99683.200.689.85
27.80.760.042.30.09215.054.00.99703.260.659.85
311.20.280.561.90.07517.060.00.99803.160.589.86
47.40.700.001.90.07611.034.00.99783.510.569.45

修改列名

发现 total_sulfur-dioxide 这个属性命名不规范,修改一下:

red_df.rename(columns={"total_sulfur-dioxide":"total_sulfur_dioxide"}, inplace=True)
# 查看修改成功
red_df.head(5)
fixed_acidityvolatile_aciditycitric_acidresidual_sugarchloridesfree_sulfur_dioxidetotal_sulfur_dioxidedensitypHsulphatesalcoholquality
07.40.700.001.90.07611.034.00.99783.510.569.45
17.80.880.002.60.09825.067.00.99683.200.689.85
27.80.760.042.30.09215.054.00.99703.260.659.85
311.20.280.561.90.07517.060.00.99803.160.589.86
47.40.700.001.90.07611.034.00.99783.510.569.45

回答以下问题

  • 每个数据集中的样本数
  • 每个数据集中的列数
  • 具有缺少值的特征
  • 红葡萄酒数据集中的重复行
  • 数据集中的质量等级唯一值的数量
  • 红葡萄酒数据集的平均密度
# 查看基本信息
red_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1599 entries, 0 to 1598
Data columns (total 12 columns):
fixed_acidity           1599 non-null float64
volatile_acidity        1599 non-null float64
citric_acid             1599 non-null float64
residual_sugar          1599 non-null float64
chlorides               1599 non-null float64
free_sulfur_dioxide     1599 non-null float64
total_sulfur_dioxide    1599 non-null float64
density                 1599 non-null float64
pH                      1599 non-null float64
sulphates               1599 non-null float64
alcohol                 1599 non-null float64
quality                 1599 non-null int64
dtypes: float64(11), int64(1)
memory usage: 150.0 KB
# 查看样本数量
len(red_df)
1599
# 数据集中列数
len(red_df.columns)
12
# 红葡萄酒中重复行的数量
sum(red_df.duplicated())
240
# 质量的唯一值
len(red_df['quality'].unique())
6
# 红葡萄酒数据集中的平均密度
red_df['density'].mean()
0.9967466791744833

合并基本数据集

# 合并红、白葡萄酒的数据# 为红葡萄酒数据框创建颜色数组(生成多个新行)
color_red = np.repeat("red",red_df.shape[0])# 为白葡萄酒数据框创建颜色数组
color_white = np.repeat("white", white_df.shape[0])
len(color_red)
1599
red_df['color'] = color_red
# 查看新添加的列,发现添加成功
red_df.head()
fixed_acidityvolatile_aciditycitric_acidresidual_sugarchloridesfree_sulfur_dioxidetotal_sulfur_dioxidedensitypHsulphatesalcoholqualitycolor
07.40.700.001.90.07611.034.00.99783.510.569.45red
17.80.880.002.60.09825.067.00.99683.200.689.85red
27.80.760.042.30.09215.054.00.99703.260.659.85red
311.20.280.561.90.07517.060.00.99803.160.589.86red
47.40.700.001.90.07611.034.00.99783.510.569.45red
white_df["color"] = color_white
white_df.head()
fixed_acidityvolatile_aciditycitric_acidresidual_sugarchloridesfree_sulfur_dioxidetotal_sulfur_dioxidedensitypHsulphatesalcoholqualitycolor
07.00.270.3620.70.04545.0170.01.00103.000.458.86white
16.30.300.341.60.04914.0132.00.99403.300.499.56white
28.10.280.406.90.05030.097.00.99513.260.4410.16white
37.20.230.328.50.05847.0186.00.99563.190.409.96white
47.20.230.328.50.05847.0186.00.99563.190.409.96white
print(len(red_df))
print(len(white_df))
1599
4898
# 附加数据框
wine_df = red_df.append(white_df)# 查看数据框,检查是否成功
wine_df.head()
fixed_acidityvolatile_aciditycitric_acidresidual_sugarchloridesfree_sulfur_dioxidetotal_sulfur_dioxidedensitypHsulphatesalcoholqualitycolor
07.40.700.001.90.07611.034.00.99783.510.569.45red
17.80.880.002.60.09825.067.00.99683.200.689.85red
27.80.760.042.30.09215.054.00.99703.260.659.85red
311.20.280.561.90.07517.060.00.99803.160.589.86red
47.40.700.001.90.07611.034.00.99783.510.569.45red
wine_df.shape
(6497, 13)

保存合并后的数据集

# 保存自己的数据集
wine_df.to_csv("winequality_edited.csv",index=False)
# 设置seaborn的样式
# sns.set_style("ticks")
wine_df = pd.read_csv("winequality_edited.csv")
wine_df.shape
(6497, 13)

可视化探索

  • 根据此数据集中的列的直方图,以下哪个特征变量显示为右偏态?固定酸度、总二氧化硫、pH 值、酒精度

hist方法详解
subplot返回值理解
subplot画图详解

绘制柱状图

fig, axs = plt.subplots(2, 2, figsize=(8, 8))#  _ 代表不分配名字的变量
_ = wine_df.fixed_acidity.plot.hist(ax=axs[0][0], rwidth=0.9)
_ = wine_df.total_sulfur_dioxide.plot.hist(ax=axs[0][1], rwidth=0.9)
_ = wine_df.pH.plot.hist(ax=axs[1][0], rwidth=0.9)
_ = wine_df.alcohol.plot.hist(ax=axs[1][1], rwidth=0.9)

image-20240531115344262

偏态的判定

下图依次表示左偏态、正态、右偏态

image-20240531114914904

wine_df.skew(axis=0)
fixed_acidity           1.723290
volatile_acidity        1.495097
citric_acid             0.471731
residual_sugar          1.435404
chlorides               5.399828
free_sulfur_dioxide     1.220066
total_sulfur_dioxide   -0.001177
density                 0.503602
pH                      0.386839
sulphates               1.797270
alcohol                 0.565718
quality                 0.189623
dtype: float64

偏度值为正,则为右偏态,说明fixed_acidity、pH、alcohol都是右偏态

  • 根据质量对不同特征变量的散点图,以下哪个最有可能对质量产生积极的影响?_挥发性酸度、残糖、pH 值、酒精度
x = wine_df[["fixed_acidity", "total_sulfur_dioxide", "pH", "alcohol", "quality"]]fig, axs = plt.subplots(2, 2, figsize=(12, 8))_  = x.plot.scatter(y='fixed_acidity', x='quality', ax=axs[0][0], linewidths=0.001, marker='o')
_  = x.plot.scatter(y='total_sulfur_dioxide', x='quality', ax=axs[0][1], linewidths=0.001, marker='o')
_  = x.plot.scatter(y='pH', x='quality', ax=axs[1][0], linewidths=0.001, marker='o')
_  = x.plot.scatter(y='alcohol', x='quality', ax=axs[1][1], linewidths=0.001, marker='o')# sns.despine()

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

从图上看其实并不是很明显,因此采用定量计算的方式,通过计算两个变量之间的相关系数,相关系数越大则越说明有积极影响

相关系数

sub_df = wine_df.iloc[:,np.r_[0,6,8,10,11]]
sub_df.corr()['quality']
fixed_acidity          -0.076743
total_sulfur_dioxide   -0.041385
pH                      0.019506
alcohol                 0.444319
quality                 1.000000
Name: quality, dtype: float64

发现alcohol的相关系数最大,说明起到的积极作用最大

查看平均值

wine_df.mean()
fixed_acidity             7.215307
volatile_acidity          0.339666
citric_acid               0.318633
residual_sugar            5.443235
chlorides                 0.056034
free_sulfur_dioxide      30.525319
total_sulfur_dioxide    115.744574
density                   0.994697
pH                        3.218501
sulphates                 0.531268
alcohol                  10.491801
quality                   5.818378
dtype: float64

按属性分组

# 按quality分组,查看每组均值
wine_df.groupby('quality').mean()
fixed_acidityvolatile_aciditycitric_acidresidual_sugarchloridesfree_sulfur_dioxidetotal_sulfur_dioxidedensitypHsulphatesalcohol
quality
37.8533330.5170000.2810005.1400000.07703339.216667122.0333330.9957443.2576670.50633310.215000
47.2888890.4579630.2723154.1537040.06005620.636574103.4328700.9948333.2316200.50564810.180093
57.3268010.3896140.3077225.8041160.06466630.237371120.8391020.9958493.2121890.5264039.837783
67.1772570.3138630.3235835.5497530.05415731.165021115.4107900.9945583.2177260.53254910.587553
77.1289620.2888000.3347644.7316960.04527230.422150108.4986100.9931263.2280720.54702511.386006
86.8352330.2910100.3325395.3829020.04112434.533679117.5181350.9925143.2232120.51248711.678756
97.4200000.2980000.3860004.1200000.02740033.400000116.0000000.9914603.3080000.46600012.180000
# 分别以quality和color为两级索引进行分组,并查看均值
wine_df.groupby(['quality','color']).mean()
fixed_acidityvolatile_aciditycitric_acidresidual_sugarchloridesfree_sulfur_dioxidetotal_sulfur_dioxidedensitypHsulphatesalcohol
qualitycolor
3red8.3600000.8845000.1710002.6350000.12250011.00000024.9000000.9974643.3980000.5700009.955000
white7.6000000.3332500.3360006.3925000.05430053.325000170.6000000.9948843.1875000.47450010.345000
4red7.7792450.6939620.1741512.6943400.09067912.26415136.2452830.9965423.3815090.59641510.265094
white7.1294480.3812270.3042334.6282210.05009823.358896125.2791410.9942773.1828830.47613510.152454
5red8.1672540.5770410.2436862.5288550.09273616.98384756.5139500.9971043.3049490.6209699.899706
white6.9339740.3020110.3376537.3349690.05154636.432052150.9045980.9952633.1688330.4822039.808840
6red8.3471790.4974840.2738242.4771940.08495615.71159940.8699060.9966153.3180720.67532910.629519
white6.8376710.2605640.3380256.4416060.04521735.650591137.0473160.9939613.1885990.49110610.575372
7red8.8723620.4039200.3751762.7206030.07658814.04522635.0201010.9961043.2907540.74125611.465913
white6.7347160.2627670.3256255.1864770.03819134.125568125.1147730.9924523.2138980.50310211.367936
8red8.5666670.4233330.3911112.5777780.06844413.27777833.4444440.9952123.2672220.76777812.094444
white6.6571430.2774000.3265145.6714290.03831436.720000126.1657140.9922363.2186860.48622911.636000
9white7.4200000.2980000.3860004.1200000.02740033.400000116.0000000.9914603.3080000.46600012.180000
# 分组属性不作为索引
wine_df.groupby(['quality','color'], as_index=False).mean()
qualitycolorfixed_acidityvolatile_aciditycitric_acidresidual_sugarchloridesfree_sulfur_dioxidetotal_sulfur_dioxidedensitypHsulphatesalcohol
03red8.3600000.8845000.1710002.6350000.12250011.00000024.9000000.9974643.3980000.5700009.955000
13white7.6000000.3332500.3360006.3925000.05430053.325000170.6000000.9948843.1875000.47450010.345000
24red7.7792450.6939620.1741512.6943400.09067912.26415136.2452830.9965423.3815090.59641510.265094
34white7.1294480.3812270.3042334.6282210.05009823.358896125.2791410.9942773.1828830.47613510.152454
45red8.1672540.5770410.2436862.5288550.09273616.98384756.5139500.9971043.3049490.6209699.899706
55white6.9339740.3020110.3376537.3349690.05154636.432052150.9045980.9952633.1688330.4822039.808840
66red8.3471790.4974840.2738242.4771940.08495615.71159940.8699060.9966153.3180720.67532910.629519
76white6.8376710.2605640.3380256.4416060.04521735.650591137.0473160.9939613.1885990.49110610.575372
87red8.8723620.4039200.3751762.7206030.07658814.04522635.0201010.9961043.2907540.74125611.465913
97white6.7347160.2627670.3256255.1864770.03819134.125568125.1147730.9924523.2138980.50310211.367936
108red8.5666670.4233330.3911112.5777780.06844413.27777833.4444440.9952123.2672220.76777812.094444
118white6.6571430.2774000.3265145.6714290.03831436.720000126.1657140.9922363.2186860.48622911.636000
129white7.4200000.2980000.3860004.1200000.02740033.400000116.0000000.9914603.3080000.46600012.180000
# 查看分组后pH属性所在列
wine_df.groupby(['quality','color'], as_index=False)['pH'].mean()
qualitycolorpH
03red3.398000
13white3.187500
24red3.381509
34white3.182883
45red3.304949
55white3.168833
66red3.318072
76white3.188599
87red3.290754
97white3.213898
108red3.267222
118white3.218686
129white3.308000

问题 1:某种类型的葡萄酒(红葡萄酒或白葡萄酒)是否代表更高的品质?

# 用 groupby 计算每个酒类型(红葡萄酒和白葡萄酒)的平均质量
wine_df.groupby("color")["quality"].mean()
color
red      5.636023
white    5.877909
Name: quality, dtype: float64

发现白葡萄酒的品质高于红葡萄酒

哪个酸度水平的平均评分最高?

# 用 Pandas 描述功能查看最小、25%、50%、75% 和 最大 pH 值
wine_df.pH.describe()
count    6497.000000
mean        3.218501
std         0.160787
min         2.720000
25%         3.110000
50%         3.210000
75%         3.320000
max         4.010000
Name: pH, dtype: float64
# 对用于把数据“分割”成组的边缘进行分组
bin_edges = [2.72, 3.11 ,3.21 ,3.32 ,4.01 ] # 用刚才计算的五个值填充这个列表
# 四个酸度水平组的标签
bin_names = [ "high", "median_high", "mediam", "low"] # 对每个酸度水平类别进行命名
help(pd.cut)
Help on function cut in module pandas.core.reshape.tile:cut(x, bins, right=True, labels=None, retbins=False, precision=3, include_lowest=False, duplicates='raise')Bin values into discrete intervals.Use `cut` when you need to segment and sort data values into bins. Thisfunction is also useful for going from a continuous variable to acategorical variable. For example, `cut` could convert ages to groups ofage ranges. Supports binning into an equal number of bins, or apre-specified array of bins.Parameters----------x : array-likeThe input array to be binned. Must be 1-dimensional.bins : int, sequence of scalars, or pandas.IntervalIndexThe criteria to bin by.* int : Defines the number of equal-width bins in the range of `x`. Therange of `x` is extended by .1% on each side to include the minimumand maximum values of `x`.* sequence of scalars : Defines the bin edges allowing for non-uniformwidth. No extension of the range of `x` is done.* IntervalIndex : Defines the exact bins to be used.right : bool, default TrueIndicates whether `bins` includes the rightmost edge or not. If``right == True`` (the default), then the `bins` ``[1, 2, 3, 4]``indicate (1,2], (2,3], (3,4]. This argument is ignored when`bins` is an IntervalIndex.labels : array or bool, optionalSpecifies the labels for the returned bins. Must be the same length asthe resulting bins. If False, returns only integer indicators of thebins. This affects the type of the output container (see below).This argument is ignored when `bins` is an IntervalIndex.retbins : bool, default FalseWhether to return the bins or not. Useful when bins is providedas a scalar.precision : int, default 3The precision at which to store and display the bins labels.include_lowest : bool, default FalseWhether the first interval should be left-inclusive or not.duplicates : {default 'raise', 'drop'}, optionalIf bin edges are not unique, raise ValueError or drop non-uniques... versionadded:: 0.23.0Returns-------out : pandas.Categorical, Series, or ndarrayAn array-like object representing the respective bin for each valueof `x`. The type depends on the value of `labels`.* True (default) : returns a Series for Series `x` or apandas.Categorical for all other inputs. The values stored withinare Interval dtype.* sequence of scalars : returns a Series for Series `x` or apandas.Categorical for all other inputs. The values stored withinare whatever the type in the sequence is.* False : returns an ndarray of integers.bins : numpy.ndarray or IntervalIndex.The computed or specified bins. Only returned when `retbins=True`.For scalar or sequence `bins`, this is an ndarray with the computedbins. If set `duplicates=drop`, `bins` will drop non-unique bin. Foran IntervalIndex `bins`, this is equal to `bins`.See Also--------qcut : Discretize variable into equal-sized buckets based on rankor based on sample quantiles.pandas.Categorical : Array type for storing data that come from afixed set of values.Series : One-dimensional array with axis labels (including time series).pandas.IntervalIndex : Immutable Index implementing an ordered,sliceable set.Notes-----Any NA values will be NA in the result. Out of bounds values will be NA inthe resulting Series or pandas.Categorical object.Examples--------Discretize into three equal-sized bins.>>> pd.cut(np.array([1, 7, 5, 4, 6, 3]), 3)... # doctest: +ELLIPSIS[(0.994, 3.0], (5.0, 7.0], (3.0, 5.0], (3.0, 5.0], (5.0, 7.0], ...Categories (3, interval[float64]): [(0.994, 3.0] < (3.0, 5.0] ...>>> pd.cut(np.array([1, 7, 5, 4, 6, 3]), 3, retbins=True)... # doctest: +ELLIPSIS([(0.994, 3.0], (5.0, 7.0], (3.0, 5.0], (3.0, 5.0], (5.0, 7.0], ...Categories (3, interval[float64]): [(0.994, 3.0] < (3.0, 5.0] ...array([0.994, 3.   , 5.   , 7.   ]))Discovers the same bins, but assign them specific labels. Notice thatthe returned Categorical's categories are `labels` and is ordered.>>> pd.cut(np.array([1, 7, 5, 4, 6, 3]),...        3, labels=["bad", "medium", "good"])[bad, good, medium, medium, good, bad]Categories (3, object): [bad < medium < good]``labels=False`` implies you just want the bins back.>>> pd.cut([0, 1, 1, 2], bins=4, labels=False)array([0, 1, 1, 3])Passing a Series as an input returns a Series with categorical dtype:>>> s = pd.Series(np.array([2, 4, 6, 8, 10]),...               index=['a', 'b', 'c', 'd', 'e'])>>> pd.cut(s, 3)... # doctest: +ELLIPSISa    (1.992, 4.667]b    (1.992, 4.667]c    (4.667, 7.333]d     (7.333, 10.0]e     (7.333, 10.0]dtype: categoryCategories (3, interval[float64]): [(1.992, 4.667] < (4.667, ...Passing a Series as an input returns a Series with mapping value.It is used to map numerically to intervals based on bins.>>> s = pd.Series(np.array([2, 4, 6, 8, 10]),...               index=['a', 'b', 'c', 'd', 'e'])>>> pd.cut(s, [0, 2, 4, 6, 8, 10], labels=False, retbins=True, right=False)... # doctest: +ELLIPSIS(a    0.0b    1.0c    2.0d    3.0e    4.0dtype: float64, array([0, 2, 4, 6, 8]))Use `drop` optional when bins is not unique>>> pd.cut(s, [0, 2, 4, 6, 10, 10], labels=False, retbins=True,...    right=False, duplicates='drop')... # doctest: +ELLIPSIS(a    0.0b    1.0c    2.0d    3.0e    3.0dtype: float64, array([0, 2, 4, 6, 8]))Passing an IntervalIndex for `bins` results in those categories exactly.Notice that values not covered by the IntervalIndex are set to NaN. 0is to the left of the first bin (which is closed on the right), and 1.5falls between two bins.>>> bins = pd.IntervalIndex.from_tuples([(0, 1), (2, 3), (4, 5)])>>> pd.cut([0, 0.5, 1.5, 2.5, 4.5], bins)[NaN, (0, 1], NaN, (2, 3], (4, 5]]Categories (3, interval[int64]): [(0, 1] < (2, 3] < (4, 5]]

# 创建 acidity_levels 列
wine_df['acidity_levels'] = pd.cut(wine_df['pH'], bin_edges, labels=bin_names)# 检查该列是否成功创建
wine_df.head()
fixed_acidityvolatile_aciditycitric_acidresidual_sugarchloridesfree_sulfur_dioxidetotal_sulfur_dioxidedensitypHsulphatesalcoholqualitycoloracidity_levels
07.40.700.001.90.07611.034.00.99783.510.569.45redlow
17.80.880.002.60.09825.067.00.99683.200.689.85redmedian_high
27.80.760.042.30.09215.054.00.99703.260.659.85redmediam
311.20.280.561.90.07517.060.00.99803.160.589.86redmedian_high
47.40.700.001.90.07611.034.00.99783.510.569.45redlow
# 用 groupby 计算每个酸度水平的平均质量
wine_df.groupby("acidity_levels")['quality'].mean()
acidity_levels
high           5.783343
median_high    5.784540
mediam         5.850832
low            5.859593
Name: quality, dtype: float64

发现酸度越低,质量评分就越好

# 保存更改,供下一段使用
wine_df.to_csv('winequality_edited_al.csv', index=False)

酒精含量高的酒是否评分较高?

# 获取酒精含量的中位数
alcohol_median = wine_df.alcohol.median()
wine_df.head();
# 选择酒精含量小于中位数的样本
low_alcohol = wine_df.query("alcohol < @alcohol_median")# 选择酒精含量大于等于中位数的样本
high_alcohol = wine_df.query("alcohol >= @alcohol_median")
# 获取低酒精含量组和高酒精含量组的平均质量评分
print("低浓度酒精:",low_alcohol.quality.mean())
print("高浓度酒精:", high_alcohol.quality.mean())
低浓度酒精: 5.475920679886686
高浓度酒精: 6.146084337349397

发现高浓度酒精的质量评级更高

口感较甜的酒是否评分较高?

# 获取残留糖分的中位数
sugar_median = wine_df["residual_sugar"].median()
# 选择残留糖分小于中位数的样本
low_sugar = wine_df.query("residual_sugar < @sugar_median")# 选择残留糖分大于等于中位数的样本
high_sugar = wine_df.query("residual_sugar >= @sugar_median")# 确保这些查询中的每个样本只出现一次
num_samples = wine_df.shape[0]
num_samples == low_sugar['quality'].count() + high_sugar['quality'].count() # 应为真
True
# 获取低糖分组和高糖分组的平均质量评分
print("高糖分质量评分:",high_sugar.quality.mean())
print("低糖分质量评分:",low_sugar.quality.mean())
高糖分质量评分: 5.82782874617737
低糖分质量评分: 5.808800743724822

发现高糖分的酒质量评分更高

类和质量图

Seaborn绘图示例
Pandas可视化文档

首先查看一下两种酒的质量均值

colors = ['red','white']
color_means = wine_df.groupby('color')['quality'].mean()
color_means.plot(kind='bar', title='Average Wine Quality by Color', color=colors, alpha=.8)
plt.xlabel('colors', fontsize=18);
plt.ylabel('Quality', fontsize=18);

output_79_0

进一步按质量和颜色分组查看

counts = wine_df.groupby(['quality', 'color']).count()['pH']
counts.plot(kind='bar', title='Counts by Wine Color and quality', color=counts.index.get_level_values(1), alpha=.7)
plt.xlabel('Quality and Color', fontsize=18)
plt.ylabel('Count', fontsize=18)
Text(0, 0.5, 'Count')

output_81_1

但红酒和白酒的样本数本来就相差较大,所以我们查看比例才更准确。

totals = wine_df.groupby('color').count()['pH']
counts = wine_df.groupby(['quality', 'color']).count()['pH']
proportions = counts / totals
proportions.plot(kind='bar', title='Counts by Wine Color and quality',color=counts.index.get_level_values(1), alpha=.7)
plt.xlabel('Quality and Color', fontsize=18)
plt.ylabel('Proportions', fontsize=18)
Text(0, 0.5, 'Proportions')

output_83_1

# 用 Matplotlib 创建柱状图

pyplot 的 bar 功能中有两个必要参数:条柱的 x 坐标和条柱的高度。

plt.bar([1, 2, 3], [224, 620, 425], color='blue');

output_86_0

可以利用 pyplot 的 xticks 功能,或通过在 bar 功能中指定另一个参数,指定 x 轴刻度标签。以下两个框的结果相同。

# 绘制条柱
plt.bar([1, 2, 3], [224, 620, 425])# 为 x 轴指定刻度标签及其标签
plt.xticks([1, 2, 3], ['a', 'b', 'c']);

output_88_0

# 用 x 轴的刻度标签绘制条柱
plt.bar([1, 2, 3], [224, 620, 425], tick_label=['a', 'b', 'c']);

output_89_0

用以下方法设置轴标题和标签。

plt.bar([1, 2, 3], [224, 620, 425], tick_label=['a', 'b', 'c'])
plt.title('Some Title')
plt.xlabel('Some X Label')
plt.ylabel('Some Y Label');


output_91_0

# example
import matplotlib.pyplot as plt
import numpy as npx = np.linspace(0, 1, 10)
number = 5
cmap = plt.get_cmap('gnuplot')
colors = [cmap(i) for i in np.linspace(0, 1, number)]for i, color in enumerate(colors, start=1):plt.plot(x, i * x + i, color=color, label='$y = {i}x + {i}$'.format(i=i))
plt.legend(loc='best')
plt.show()


外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/845616.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Lesson6--排序(初级数据结构完结篇)

【本节目标】 1. 排序的概念及其运用 2. 常见排序算法的实现 3. 排序算法复杂度及稳定性分析 1.排序的概念及其运用 1.1排序的概念 排序 &#xff1a;所谓排序&#xff0c;就是使一串记录&#xff0c;按照其中的某个或某些关键字的大小&#xff0c;递增或递减的排列起来…

Thread的stop和interrupt的区别

Thread.stop Thread.stop()方法已被废弃。 因为本质上它是不安全的&#xff0c;使用该方法可能会导致数据、资源不一致的问题&#xff0c; public class ThreadDemo {static class MyThread extends Thread {Overridepublic void run() {while (true) {try {Thread.sleep(10…

解决Windows 10通过SSH连接Ubuntu 20.04时的“Permission Denied”错误

在使用SSH连接远程服务器时&#xff0c;我们经常可能遇到各种连接错误&#xff0c;其中“Permission denied, please try again”是较为常见的一种。本文将分享一次实际案例的解决过程&#xff0c;帮助你理解如何排查并解决这类问题。 问题描述 在尝试从Windows 10系统通过SS…

面试题 17.05. 字母与数字(前缀和)

给定一个放有字母和数字的数组&#xff0c;找到最长的子数组&#xff0c;且包含的字母和数字的个数相同。 返回该子数组&#xff0c;若存在多个最长子数组&#xff0c;返回左端点下标值最小的子数组。若不存在这样的数组&#xff0c;返回一个空数组。 示例 1: 输入: ["…

openssl 常用命令demo

RSA Private Key的结构&#xff08;ASN.1&#xff09; RSAPrivateKey :: SEQUENCE { version Version, modulus INTEGER, -- n publicExponent INTEGER, -- e privateExponent INTEGER, -- d prime1 INTEGER, -- …

嵌入式人工智能开发:基于TensorFlow Lite和OpenCV的实时姿态估计算法实现

文章目录 引言环境准备人工智能在嵌入式系统中的应用场景代码示例常见问题及解决方案结论 1. 引言 在嵌入式系统中集成人工智能&#xff08;AI&#xff09;技术已经成为一种重要的发展方向。实时姿态估计是AI在嵌入式领域的一个高级应用&#xff0c;能够在资源受限的环境中实…

海外动态IP代理可以用来批量注册邮箱吗?

无论是个人还是企业&#xff0c;都需要使用邮箱进行沟通、注册账号、接收通知等多种用途。然而&#xff0c;由于互联网服务商为了防止滥用和垃圾邮件的传播&#xff0c;通常对注册邮箱设置了一定的限制&#xff0c;如IP限制、验证码验证等。为了解决这些问题&#xff0c;海外动…

GPT LoRA 大模型微调,生成猫耳娘

往期热门专栏回顾 专栏描述Java项目实战介绍Java组件安装、使用&#xff1b;手写框架等Aws服务器实战Aws Linux服务器上操作nginx、git、JDK、VueJava微服务实战Java 微服务实战&#xff0c;Spring Cloud Netflix套件、Spring Cloud Alibaba套件、Seata、gateway、shadingjdbc…

关于ida如何进行远程linux调试(详解)

首先我们需要安装工具软件VMware虚拟机和finalshell&#xff0c;并在虚拟机中安装centos 7系统&#xff0c;还要将finalshell连接到该系统中&#xff0c;具体操作可以去b站搜黑马Linux学习&#xff0c;学完该课程的p5&#xff0c;p6&#xff0c;p8即可&#xff0c;我接下来讲的…

[Linux]vsftp配置大全---超完整版

[Linux]vsftp配置大全---超完整版 以下文章介绍Liunx 环境下vsftpd的三种实现方法 一、前言 Vsftp(Very Secure FTP)是一种在Unix/Linux中非常安全且快速稳定的FTP服务器&#xff0c;目前已经被许多大型站点所采用&#xff0c;如ftp.redhat.com,ftp.kde.org,ftp.gnome.org.等。…

js:flex弹性布局

目录 代码&#xff1a; 1、 flex-direction 2、flex-wrap 3、justify-content 4、align-items 5、align-content 代码&#xff1a; <!DOCTYPE html> <html lang"en"> <head><meta charset"UTF-8"><meta name"viewp…

Python自然语言处理(NLP)库之NLTK使用详解

概要 自然语言处理(NLP)是人工智能和计算机科学中的一个重要领域,涉及对人类语言的计算机理解和处理。Python的自然语言工具包(NLTK,Natural Language Toolkit)是一个功能强大的NLP库,提供了丰富的工具和数据集,帮助开发者进行各种NLP任务,如分词、词性标注、命名实体…

Excel 将分组头信息填入组内明细行

Excel由多个纵向的分组表组成&#xff0c;组之间由空白行隔开&#xff0c;每组第1、2行的第2格是分组表头&#xff0c;第3行是列头&#xff0c;第1列和第6列数据是空白的&#xff1a; ABCDEF1ATLANTIC SPIRIT2Looe3VesselSpeciesSizeKgDateLocation4POLLACK22.523/04/20245POL…

华为 CANN

华为 CANN 1 介绍1.1 概述1.2 CANN 是华为昇腾计算产业的重要一环1.3 昇腾系列处理器1.4 昇腾 AI 产业1.5 从 AI 算法到产品化落地流程1.6 多样性计算架构1.7 人工智能各层级图示1.8 人工智能技术发展历史 2 CANN vs CUDA支持平台优化方向编程接口生态系统与应用性能与功能 3 C…

SwiftUI中SafeArea的管理与使用(ignoresSafeArea, safeAreaPadding, safeAreaInset)

SafeArea是指不与视图控制器提供的导航栏、选项卡栏、工具栏或其他视图重叠的内容空间。 在UIKit中&#xff0c;开发人员需要使用safeAreaInsets或safeAreaLayoutGuide来确保视图被放置在界面的可见部分。 SwiftUI彻底简化了上述过程&#xff0c;除非开发者明确要求视图突破安…

Java—— StringBuilder 和 StringBuffer

1.介绍 由于String的不可更改特性&#xff0c;为了方便字符串的修改&#xff0c;Java中又提供了StringBuilder和Stringbuffer类&#xff0c;这两个类大部分功能是相同的&#xff0c;以下为常用方法&#xff1a; public static void main(String[] args) {StringBuilder sb1 n…

百度中心之星

目录 新材料 星际航行 新材料 直接模拟&#xff1a;因为要考虑上次出现的位置&#xff0c;所以使用map映射最好&#xff0c;如果没有出现过就建立新映射&#xff0c;如果出现过但是已经反应过就跳过&#xff0c;如果出现过但是不足以反应&#xff0c;就建立新映射&#xff0c;…

react 怎样配置ant design Pro 路由?

Ant Design Pro 是基于 umi 和 dva 的框架&#xff0c;umi 已经预置了路由功能&#xff0c;只需要在 config/router.config.js 中添加路由信息即可。 例如&#xff0c;假设你需要为 HelloWorld 组件创建一个路由&#xff0c;你可以将以下代码添加到 config/router.config.js 中…

parallels版虚拟机Linux中安装parallels tools报错

按照一个博客的教程安装的可还是安装不了&#xff0c;请指点指点 1.先是输入name -a 输出&#xff1a;Linux user 6.6.9-arm64 #11 SMP Kali 6.6.9-1kali1 (2024-01-08) aarch64GNU/Linux2.按照版本号找对应的文件并下载 第一个文件&#xff1a; linux-headers-6.6.9-arm64_…

Three.js 性能监测工具 Stats.js

目录 前言 性能监控 引入 Stats 使用Stats 代码 前言 通过stats.js库可以查看three.js当前的渲染性能&#xff0c;具体说就是计算three.js的渲染帧率(FPS),所谓渲染帧率(FPS)&#xff0c;简单说就是three.js每秒钟完成的渲染次数&#xff0c;一般渲染达到每秒钟60次为…