为什么饼图有问题

介绍 (Introduction)

It seems as if people are split on pie charts: either you passionately hate them, or you are indifferent. In this article, I am going to explain why pie charts are problematic and, if you fall into the latter category, what you can do when creating pie charts to avoid upsetting those in the former.

似乎人们在饼图上是分裂的:要么热情地恨他们,要么冷漠。 在本文中,我将解释为什么饼图会出现问题,如果您属于后者,那么创建饼图时可以做些什么,以避免使前者烦恼。

饼图为什么有问题? (Why are pie charts problematic?)

他们用大小来传达信息 (They use size to convey information)

A pie chart uses the size of a portion (slice) of a circle (pie) to display a numerical variable. This factor is not an issue in and of itself, as many chart types use size to convey information, including bubble charts and bar charts; however, while bubble charts and bar charts use diameter and height, respectively, to convey information, pie charts rely on the angle describing a slice — -and the human eye is not very good at recognizing differences in angles.

饼图使用圆(饼图)的一部分(切片)的大小来显示数字变量。 这个因素本身不是问题,因为许多图表类型都使用大小来传达信息,包括气泡图和条形图。 但是,虽然气泡图和条形图分别使用直径和高度来传达信息,但饼图依赖于描述切片的角度-人眼在识别角度差异方面并不十分擅长。

Suppose I took a survey on people’s favorite kinds of pie. In the chart below, it is difficult to see how the categories relate to each other; individually, Cherry and Rhubarb seem to comprise a smaller portion of the pie than either Apple or Pumpkin, but it may not be obvious (without looking at the data) which is the smaller slice.

假设我对人们喜欢的派进行了调查。 在下面的图表中,很难看到类别之间的相互关系。 分别看来,樱桃和大黄在馅饼中所占的份额似乎比苹果或南瓜要小,但不明显(不查看数据)是较小的份额。

#Adjusting plot size and margins
options(repr.plot.width=8, repr.plot.height=4)
par(mfrow=c(1,1), mai = c(0.5, 0, 0.75, 0))#Data for pie chart
x = c(18, 15, 13, 25, 29)
labels = c("Key Lime", "Cherry", "Rhubarb", "Pumpkin", "Apple")
cols = c("greenyellow", "red", "mediumvioletred", "darkorange",
"cornsilk")#Build the pie chart
pie(x, labels, radius = 1, col=cols)

他们不能很好地显示许多类别 (They cannot display many categories well)

This issue of conveying size via angle is even more pronounced when many categories are shown in a single pie chart. Furthermore, unlike some charts that are used to display several categories at once, such as bar charts, pie charts depend on differences in color to denote category; therefore, a large palette of colors is necessary, and without proper selection of the palette, the results could be either garish or ambiguous.

当在单个饼图中显示许多类别时,通过角度传送尺寸的问题更加明显。 此外,与某些条形图等用于一次显示多个类别的图表不同,饼形图依靠颜色的不同来表示类别。 因此,必须使用大的调色板,并且如果不正确选择调色板,结果可能是乱码或模棱两可的。

#Adjusting plot size and margins
options(repr.plot.width=8, repr.plot.height=4)
par(mfrow=c(1,1), mai = c(0.55, 0, 0.8, 0))#Data for pie chart
x = c(2, 4, 5, 10, 13, 15, 15, 17, 19)
labels = c("Key Lime", "Pecan", "Cherry", "Blueberry", "Rhubarb",
"Lemon Meringue", "Blackberry", "Pumpkin", "Apple")
cols = c("greenyellow", "tan4", "red", "darkblue",
"mediumvioletred", "yellow", "black", "darkorange",
"cornsilk2")#Build the pie chart
pie(x, labels, radius = 1, col=cols)
Image for post
Image by Author
图片作者

它们显示了整体的一部分 (They show parts of a whole)

Pie charts represent a whole as its components. Therefore, if your dataset is a subset of a larger dataset (and thus does not represent the whole) or if your dataset consists of independent categories (and thus represents multiple wholes), then a pie chart may not be appropriate.

饼图将整体表示为其组成部分。 因此,如果您的数据集是较大数据集的子集(因此不表示整体),或者如果您的数据集由独立的类别组成(并因此表示多个整体),则饼形图可能不合适。

热门套餐中的饼图 (Pie charts in popular packages)

I wouldn’t want to assume anyone’s opinion on as divisive a topic as the pie chart, but perhaps the disdain for this chart type is best exhibited by the lack of built-in functions for creating them in two very popular data visualization packages: ggplot2 (R) and seaborn (Python). With both packages, a pie chart can be created only through trickery.

我不想假设任何人都对饼图这个具有争议性的话题发表意见,但是对于这种图表类型的鄙视最好表现为缺乏在两个非常流行的数据可视化软件包中创建内置图表的内置功能:ggplot2 (R)和seaborn(Python)。 使用这两个软件包,只能通过欺骗来创建饼图。

诡计 (Trickery)

It is convenient — -perhaps a little too convenient — -that a pie chart is no more than a single stacked bar displayed in polar coordinates. The code below builds the pie chart shown above, but using ggplot2.

饼图只不过是极坐标中显示的单个堆积条,这很方便-也许有点方便了。 下面的代码使用ggplot2构建上面显示的饼图。

#Adjusting plot size and margins
options(repr.plot.width=8, repr.plot.height=4)
par(mfrow=c(1,1), mai = c(0.55, 0, 0.8, 0))#Data for the pie chart
values = c(9, 2, 5, 10, 13, 15, 10, 17, 19)
labels = c("Key \nLime", "Pecan", "Cherry", "Blueberry", "Rhubarb",
"Lemon \nMeringue", "Blackberry", "Pumpkin", "Apple")
cols = c("Key \nLime"="greenyellow", "Pecan"="tan4",
"Cherry"="red", "Blueberry"="darkblue",
"Rhubarb"="mediumvioletred", "Lemon \nMeringue"="yellow",
"Blackberry"="black", "Pumpkin"="darkorange",
"Apple"="cornsilk2")data = data.frame(labels, values)#Build the pie chart
ggplot(data, aes(x="", y=values, fill=labels))+
geom_bar(width = 1, stat = "identity") +
scale_fill_manual(values=cols) +
coord_polar("y", start=0) + #Use polar coordinates
theme(axis.title=element_blank(),
axis.text=element_blank(),
legend.title=element_blank())
Image for post
Image by Author
图片作者

哪些图表类型可用于替换饼图? (What chart types can be used to replace pie charts?)

条形图 (Bar charts)

Similar to pie charts, bar charts use size to convey information; however, for bar charts, the height of a rectangle varies, and differences between the heights of bars are easier to recognize than the differences between the angles of portions of a circle. Furthermore, bar charts can be configured to show absolute numbers, percentages, or both!

类似于饼图,条形图使用大小来传达信息。 但是,对于条形图,矩形的高度是变化的,并且条形的高度之间的差异比圆的各个部分的角度之间的差异更容易识别。 此外,可以将条形图配置为显示绝对数字,百分比或同时显示两者!

#Adjusting plot size and margins
options(repr.plot.width=8, repr.plot.height=4)
par(mfrow=c(1,1), mai = c(0.5, 1, 0.2, 1))#Data for bar chart
values = c(9, 2, 5, 10, 13, 15, 10, 17, 19)
labels = c("Key \nLime", "Pecan", "Cherry", "Blueberry", "Rhubarb",
"Lemon \nMeringue", "Blackberry", "Pumpkin", "Apple")data = data.frame(labels, values)
data = data[order(-values),]#Build the bar chart
barplot(height=data$values,
names.arg=data$labels,
ylab="Votes",
ylim = c(0, 20),
cex.names=0.7)
Image for post
Image by Author
图片作者

华夫饼图 (Waffle Charts)

Waffle charts, which are growing in popularity, use number rather than size to visualize a numerical dimension. The resulting graph is similar to a stacked bar or tree map; however, because each square is a unit, compared to alternatives that rely solely on size, it is easier for a person to confirm if a perceived difference between categories is real without relying on text.

华夫饼图越来越流行,它使用数字而不是大小来可视化数字维度。 生成的图形类似于堆积的条形图或树形图。 但是,由于每个正方形都是一个单位,与仅依赖于大小的替代方案相比,一个人可以更容易地在不依赖文本的情况下确认类别之间的感知差异是否是真实的。

#Adjusting plot size and margins
options(repr.plot.width=8, repr.plot.height=4)
par(mfrow=c(1,1), mai = c(0.5, 1, 0.2, 1))#Create data
pies = c("Pecan"=2, "Cherry"=5, "Key Lime"=9, "Blueberry"=10,
"Blackberry"=10, "Rhubarb"=13, "Lemon Meringue"=15,
"Pumpkin"=17, "Apple"=19)waffle(pies, rows=5, size=1.5,
colors=c("tan4", "red", "greenyellow", "darkblue", "black",
"mediumvioletred", "yellow", "darkorange",
"cornsilk2"),
xlab="1 square = 1 vote", legend_pos = "bottom")
Image for post
Image by Author
图片作者

但是,如果我不喜欢其他选择,该怎么办? (But what if I don’t like the alternatives?)

Even though there are many alternatives (e.g., bar charts, stacked bars, waffle charts, lollipop charts, tree maps), pie charts are a familiar chart type to most people, and depending on the audience, familiarity may be an important factor that affects interpretability. So if you want to stick with pie charts, consider taking the following advice.

即使有很多选择(例如条形图,堆积的条形图,华夫饼图,棒棒糖图,树形图),饼图也是大多数人熟悉的图表类型,并且取决于受众,熟悉度可能是影响可解释性。 因此,如果您要坚持饼图,请考虑采取以下建议。

通过分组限制类别数 (Limit the number of categories via grouping)

To avoid visual clutter and to ensure your pie chart is readable, the number of categories should be small. Therefore, it may be useful to group categories that individually comprise a small proportion of the pie into a single category. Note that, when using this approach, it may be helpful to list the items contained in the derived category. Furthermore, it is best to ensure that the new category does not form the majority of the resulting pie.

为避免视觉混乱,并确保饼图可读,类别的数量应少。 因此,将单独包含一小部分馅饼的类别归为一个类别可能很有用。 请注意,使用这种方法时,列出派生类别中包含的项目可能会有所帮助。 此外,最好确保新类别不构成结果派的大部分。

将百分比或绝对数字(或两者)显示为文本 (Show percentages or absolute numbers (or both) as text)

Text can be used to prevent misunderstandings due to ambiguity. By including text information, a person can see if there are differences among the categories. However, if it is necessary to include text, then one can argue that the visualization itself is ineffective (so be prepared to defend your choice of chart type).

文本可用于防止由于歧义而引起的误解。 通过包含文本信息,人们可以查看类别之间是否存在差异。 但是,如果有必要包含文本,则可以认为可视化本身是无效的(因此请准备好捍卫您选择的图表类型)。

#Adjusting plot size and margins
options(repr.plot.width=8, repr.plot.height=4)
par(mfrow=c(1,1), mai = c(0.55, 0, 0.8, 0))#Data for pie chart
x = c(15, 20, 35, 30)
labels = c("Other (15%)", "Cherry (20%)", "Pumpkin (35%)",
"Apple (30%)")
cols = c("black", "red", "darkorange", "cornsilk2")#Build the pie chart
pie(x, labels, radius = 1, col=cols)
Image for post
Image by Author
图片作者

结论 (Conclusions)

I hope you found this discussion of pie charts informative. While pie charts can be avoided in most cases, they remain a pithy little chart on which many, many people have little to no opinion. However, to avoid a mass uptake of pitchforks and torches, please remember to employ pie charts responsibly and to use caution when including any controversial chart type in your next presentation.

我希望您发现对饼形图的讨论能提供更多信息。 尽管在大多数情况下都可以避免饼图,但它们仍然是一个精巧的小图表,很多人对此几乎没有意见。 但是,为避免大量吸收干草叉和火炬,请记住要负责任地使用饼形图,并在下次演示中包括任何有争议的图表类型时保持谨慎。

所需的库 (Required libraries)

library(repr)
library(dplyr)
library(plotly)
library(waffle)
library(ggplot2)
library(RColorBrewer)

翻译自: https://medium.com/analytics-vidhya/why-pie-charts-are-problematic-1075ce7d0270

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389443.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

先知模型 facebook_使用Facebook先知进行犯罪率预测

先知模型 facebookTime series prediction is one of the must-know techniques for any data scientist. Questions like predicting the weather, product sales, customer visit in the shopping center, or amount of inventory to maintain, etc - all about time series …

github gists 101使代码共享漂亮

If you’ve been going through Medium, looking at technical articles, you’ve undoubtedly seen little windows that look like the below:如果您一直在阅读Medium,并查看技术文章,那么您无疑会看到类似于以下内容的小窗口: def hello_…

基于Netty的百万级推送服务设计要点

1. 背景1.1. 话题来源最近很多从事移动互联网和物联网开发的同学给我发邮件或者微博私信我,咨询推送服务相关的问题。问题五花八门,在帮助大家答疑解惑的过程中,我也对问题进行了总结,大概可以归纳为如下几类:1&#x…

鲜为人知的6个黑科技网站_6种鲜为人知的熊猫绘图工具

鲜为人知的6个黑科技网站Pandas is the go-to Python library for data analysis and manipulation. It provides numerous functions and methods that expedice the data analysis process.Pandas是用于数据分析和处理的Python库。 它提供了加速数据分析过程的众多功能和方法…

VRRP网关冗余

实验要求 1、R1创建环回口,模拟外网 2、R2,R3使用VRRP技术 3、路由器之间使用EIGRP路由协议  实验拓扑  实验配置  R1(config)#interface loopback 0R1(config-if)#ip address 1.1.1.1 255.255.255.0R1(config-if)#int e0/0R1(config-if)#ip addr…

大熊猫卸妆后_您不应错过的6大熊猫行动

大熊猫卸妆后数据科学 (Data Science) Pandas is used mainly for reading, cleaning, and extracting insights from data. We will see an advanced use of Pandas which are very important to a Data Scientist. These operations are used to analyze data and manipulate…

数据eda_关于分类和有序数据的EDA

数据eda数据科学和机器学习统计 (STATISTICS FOR DATA SCIENCE AND MACHINE LEARNING) Categorical variables are the ones where the possible values are provided as a set of options, it can be pre-defined or open. An example can be the gender of a person. In the …

jdk重启后步行_向后介绍步行以一种新颖的方式来预测未来

jdk重启后步行“永远不要做出预测,尤其是关于未来的预测。” (KK Steincke) (“Never Make Predictions, Especially About the Future.” (K. K. Steincke)) Does this picture portray a horse or a car? 这张照片描绘的是马还是汽车? How likely is …

mongodb仲裁者_真理的仲裁者

mongodb仲裁者Coming out of college with a background in mathematics, I fell upward into the rapidly growing field of data analytics. It wasn’t until years later that I realized the incredible power that comes with the position. As Uncle Ben told Peter Par…

优化 回归_使用回归优化产品价格

优化 回归应用数据科学 (Applied data science) Price and quantity are two fundamental measures that determine the bottom line of every business, and setting the right price is one of the most important decisions a company can make. Under-pricing hurts the co…

大数据数据科学家常用面试题_进行数据科学工作面试

大数据数据科学家常用面试题During my time as a Data Scientist, I had the chance to interview my fair share of candidates for data-related roles. While doing this, I started noticing a pattern: some kinds of (simple) mistakes were overwhelmingly frequent amo…

scrapy模拟模拟点击_模拟大流行

scrapy模拟模拟点击复杂系统 (Complex Systems) In our daily life, we encounter many complex systems where individuals are interacting with each other such as the stock market or rush hour traffic. Finding appropriate models for these complex systems may give…

vue.js python_使用Python和Vue.js自动化报告过程

vue.js pythonIf your organization does not have a data visualization solution like Tableau or PowerBI nor means to host a server to deploy open source solutions like Dash then you are probably stuck doing reports with Excel or exporting your notebooks.如果…

plsql中导入csvs_在命令行中使用sql分析csvs

plsql中导入csvsIf you are familiar with coding in SQL, there is a strong chance you do it in PgAdmin, MySQL, BigQuery, SQL Server, etc. But there are times you just want to use your SQL skills for quick analysis on a small/medium sized dataset.如果您熟悉SQ…

计算机科学必读书籍_5篇关于数据科学家的产品分类必读文章

计算机科学必读书籍Product categorization/product classification is the organization of products into their respective departments or categories. As well, a large part of the process is the design of the product taxonomy as a whole.产品分类/产品分类是将产品…

交替最小二乘矩阵分解_使用交替最小二乘矩阵分解与pyspark建立推荐系统

交替最小二乘矩阵分解pyspark上的动手推荐系统 (Hands-on recommender system on pyspark) Recommender System is an information filtering tool that seeks to predict which product a user will like, and based on that, recommends a few products to the users. For ex…

python 网页编程_通过Python编程检索网页

python 网页编程The internet and the World Wide Web (WWW), is probably the most prominent source of information today. Most of that information is retrievable through HTTP. HTTP was invented originally to share pages of hypertext (hence the name Hypertext T…

火种 ctf_分析我的火种数据

火种 ctfOriginally published at https://www.linkedin.com on March 27, 2020 (data up to date as of March 20, 2020).最初于 2020年3月27日 在 https://www.linkedin.com 上 发布 (数据截至2020年3月20日)。 Day 3 of social distancing.社会疏离的第三天。 As I sit on…

data studio_面向营销人员的Data Studio —报表指南

data studioIn this guide, we describe both the theoretical and practical sides of reporting with Google Data Studio. You can use this guide as a comprehensive cheat sheet in your everyday marketing.在本指南中,我们描述了使用Google Data Studio进行…

人流量统计系统介绍_统计介绍

人流量统计系统介绍Its very important to know about statistics . May you be a from a finance background, may you be data scientist or a data analyst, life is all about mathematics. As per the wiki definition “Statistics is the discipline that concerns the …