Insightful and aesthetic visualizations don’t have to be a pain to create. This article will prevent 5+ simple one-liners you can add to your code to increase its style and informational value.
富有洞察力和美学的可视化不必费心创建。 本文将防止您添加到代码中以增加其样式和信息价值的5种以上简单的单行代码。
将线图绘制成面积图 (Line plot into area chart)
Consider the following standard line plot, created with seaborn’s lineplot
, with the husl
palette and whitegrid
style. The data is generated as a sine wave with normally distributed data and elevated above the x-axis.
考虑下面的标准线图,该线图是用seaborn的lineplot
创建的,具有husl
调色板和whitegrid
样式。 数据以正弦波的形式生成,具有正态分布的数据,并高于x轴。
With a few styling choices, the plot looks presentable. However, there is one issue: by default, Seaborn does not begin at a zero baseline, and the numerical impact of the y-axis is lost. Assuming that the x
and y
variables are named as such, adding plt.fill_between(x,y,alpha=0.4)
will turn the data into an area chart that more nicely begins at the base line and emphasizes the y-axis.
通过一些样式选择,该图看起来很合适。 但是,存在一个问题:默认情况下,Seaborn并不是从零基线开始的,并且y轴的数值影响会丢失。 假设x
和y
变量是这样命名的,则添加plt.fill_between(x,y,alpha=0.4)
会将数据转换为面积图,该图从基线开始会更好,并强调y轴。
Note that this line is added in conjunction with the original lineplot, sns.lineplot(x,y)
, which provides the bolded line at the top. The alpha
parameter, which appears in many seaborn plots as well, controls the transparency of the area (the less, the lighter). plt
represents the matplotlib library. In some cases, using area may not be suitable.
请注意,这条线是与原始线图sns.lineplot(x,y)
,它在顶部提供了粗体线。 alpha
参数(也出现在许多海洋图中)控制着区域的透明度(越少越亮)。 plt
代表matplotlib库。 在某些情况下,使用区域可能不合适。
When multiple area plots are used, it can emphasize overlapping and intersections of the lines, although, again, it may not be appropriate for the visualization context.
当使用多个区域图时,它可以强调线的重叠和相交,尽管同样,它可能不适用于可视化上下文。
线图到堆叠区域图 (Line plot to stacked area plot)
Sometimes, the relationship between lines requires that the area plots be stacked on top of each other. This is easy to do with matplotlib stackplot: plt.stackplot(x,y,alpha=0.4)
. In this case, colors were manually specified through colors=[]
, which takes in a list of color names or hex codes.
有时,线之间的关系要求面积图彼此堆叠。 使用matplotlib stackplot很容易做到: plt.stackplot(x,y,alpha=0.4)
。 在这种情况下,颜色是通过colors=[]
手动指定的,它接受颜色名称或十六进制代码的列表。
Note that y
is a list of y1
and y2
, which represent the noisy sine and cosine waves. These are stacked on top of each other in the area representation, and can heighten understanding of the relative distance between two area plots.
请注意, y
是y1
和y2
的列表,它们代表有噪声的正弦波和余弦波。 它们在区域表示中彼此堆叠,可以加深对两个区域图之间相对距离的理解。
删除讨厌的传说 (Remove pesky legends)
Seaborn often uses legends by default when the hue
parameter is called to draw multiple of the same plot, differing by the column specified as the hue
. These legends, while sometimes helpful, often cover up important parts of the plot and contain information that could be better expressed elsewhere (perhaps in a caption).
默认情况下,在调用hue
参数绘制同一图的倍数时,Seaborn通常默认使用图例,不同之处在于指定为hue
的列。 这些图例虽然有时会有所帮助,但通常会掩盖剧情中的重要部分,并包含可以在其他地方更好地表达的信息(也许在标题中)。
For example, consider the following medical dataset, which contains signals from various subjects. In this case, we want to use multiple line plots to visualize the general trend and range across different patients by setting the subject
column as the hue
(yes, putting this many lines is known as a ‘spaghetti chart’ and is generally not advised). One can see how the default labels are a) not ordered, b) so long that it obstructs part of the chart, and c) not the point of the visualization.
例如,考虑以下医疗数据集,其中包含来自各个受试者的信号。 在这种情况下,我们希望使用多个折线图,通过将subject
列设置为hue
来可视化不同患者的总体趋势和范围(是的,放置这么多折线被称为“意大利面条图”,通常不建议这样做) 。 可以看到默认标签是如何排列的:a)没有排序,b)太长以致于它阻碍了图表的一部分,并且c)没有可视化的要点。
This can be done by setting the plot equal to a variable (commonly g
), like such: g=sns.lineplot(x=…, y=…, hue=…)
. Then, by accessing the plot object’s legend attributes, we can remove it: g.legend_.remove()
. If you are working with a grid object like PairGrid or FacetGrid, use g._legend.remove()
.
这可以通过将绘图设置为等于变量(通常为g
)来完成,例如: g=sns.lineplot(x=…, y=…, hue=…)
。 然后,通过访问绘图对象的图例属性,可以将其删除: g.legend_.remove()
。 如果您正在使用诸如PairGrid或FacetGrid之类的网格对象,请使用g._legend.remove()
。
手动X和Y轴基线 (Manual x and y axis baselines)
Seaborn does not draw the x and y axis lines by default, but the axes are important for understanding not only the shape of the data but where they stand in relation to the coordinate system.
Seaborn默认情况下不会绘制x和y轴线,但是这些轴不仅对于理解数据的形状而且对于理解其相对于坐标系的位置非常重要。
Matplotlib provides a simple way to add the x-axis by simply adding g.axhline(0)
, where g
is the grid object and 0 represents the y-axis value at which the horizontal line is placed. Additionally, one can specify color
(in this case color=’black’
) and alpha
(transparency, in this case alpha=0.5
). linestyle
is a parameter used to create dotted lines by being set to ‘--’
.
Matplotlib提供了一种简单的方法,只需添加g.axhline(0)
即可添加x轴,其中g
是网格对象,0表示放置水平线的y轴值。 另外,可以指定color
(在这种情况下为color='black'
)和alpha
(透明度,在这种情况下为alpha=0.5
)。 linestyle
是用于通过将其设置为'--'
来创建虚线的参数。
Additionally, vertical lines can be added through g.axvline(0)
.
另外,可以通过g.axvline(0)
添加垂直线。
You can also use axhline
to display averages or benchmarks for, say, bar plots. For example, say that we want to show the plants that were able to meet the 0.98 petal_width
benchmark based on sepal_width
.
您还可以使用axhline
显示axhline
平均值或基准。 例如,假设我们要显示能够满足基于sepal_width
petal_width
基准的sepal_width
。
对数刻度 (Logarithmic Scales)
Logarithmic scales are used because they can show a percent change. In many scenarios, this is exactly what is necessary — after all, an increase of $1000 for a business that normally earns $300 is not the same as an increase of $1000 for a megacorporation that earns billions. Instead of needing to calculate percentages in the data, matplotlib can convert scales to logarithmic.
使用对数刻度,因为它们可以显示百分比变化。 在许多情况下,这正是必要的条件—毕竟,通常赚取300美元的企业增加1000美元,与赚取数十亿美元的大型企业增加1000美元并不相同。 matplotlib无需计算数据中的百分比,而是可以将比例转换为对数。
As with many matplotlib features, logarithmic scales operate on the ax of a standard figure created with fig, ax = plt.subplots(figsize=(x,y))
. Then, a logarithmic x-scale is as simple as ax.set_xscale(‘log’)
:
与许多matplotlib功能一样,对数刻度在用fig, ax = plt.subplots(figsize=(x,y))
创建的标准图形的fig, ax = plt.subplots(figsize=(x,y))
。 然后,对数x ax.set_xscale('log')
与ax.set_xscale('log')
一样简单:
A logarithmic y-scale, which is more commonly used, can be done with ax.setyscale(‘log’)
:
可以使用ax.setyscale('log')
完成更常用的对数y ax.setyscale('log')
:
荣誉奖 (Honorable mentions)
Invest in a good default palette. Color is one of the most important aspects of a visualization: it ties it together and expressed a theme. You can choose and set one of Seaborn’s many great palettes with
sns.set_palette(name)
. Check out demonstrations and tips to choosing palettes here.投资一个好的默认调色板。 颜色是可视化的最重要方面之一:颜色将其绑在一起并表达了主题。 您可以使用
sns.set_palette(name)
选择并设置Seaborn的众多出色调色板sns.set_palette(name)
。 在此处查看演示和选择调色板的提示。You can add grids and change the background color with
sns.set_style(name)
, where name can bewhite
(default),whitegrid
,dark
, ordarkgrid
.您可以使用
sns.set_style(name)
添加网格并更改背景颜色,其中name可以是white
(默认),whitegrid
,dark
或darkgrid
。Did you know that matplotlib and seaborn can process LaTeX, the beautiful mathematical formatting language? You can use it in your
x
/y
axis labels, titles, legends, and more by enclosing LaTeX expressions within dollar signs$expression$
.您是否知道matplotlib和seaborn可以处理LaTeX(一种漂亮的数学格式化语言) ? 通过将LaTeX表达式包含在美元符号
$expression$
,可以在x
/y
轴标签,标题,图例等中使用它。- Explore different linestyles, annotation sizes, and fonts. Matplotlib is full of them, if only you have the will to explore its documentation pages. 探索不同的线型,注释大小和字体。 Matplotlib充满了它们,只要您愿意探索它的文档页面。
Most plots have additional parameters, such as error bars for bar plots, thickness, dotted lines, and transparency for line plots. Taking some time to visit the documentation pages and peering through all the available parameters can take only a minute but has the potential to bring your visualization to top-notch aesthetic and informational value.
大多数图都有其他参数,例如条形图的误差线,厚度,虚线和线图的透明度。 花一些时间访问文档页面并浏览所有可用参数仅需一分钟,但有可能使您的可视化达到一流的美学和信息价值。
For example, adding the parameter
例如,添加参数
inner=’quartile’
in a violinplot draws the first, second, and third quartiles of a distribution in dotted lines. Two words for immense informational gain — I’d say that’s a good deal!小提琴图中的
inner='quartile'
quartileinner='quartile'
用虚线绘制分布的第一,第二和第三四分位数。 两个词可带来巨大的信息收益-我说这很划算!
翻译自: https://towardsdatascience.com/5-simple-one-liners-youve-been-looking-for-to-level-up-your-python-visualization-42ebc1deafbc
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389147.shtml
如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!