奇迹网站可视化排行榜]
When reading a visualization is what we see really what we get?
阅读可视化内容时,我们真正看到的是什么?
This post summarizes and accompanies our paper “Surfacing Visualization Mirages” that was presented at CHI 2020 with a best paper honorable mention. This post was written collaboratively by Andrew McNutt, Gordon Kindlmann, and Michael Correll.
这篇文章总结并伴随了我们 在 2020年CHI上 发表的 论文“ 堆焊可视化奇迹 ”,并 获得了最佳论文荣誉奖。 这篇文章是由 Andrew McNutt , Gordon Kindlmann 和 Michael Correll 合作撰写的 。
TL; DR (TL;DR)
When reading a visualization, is what we see really what we get? There are a lot of ways that visualizations can mislead us, such that they appear to show us something interesting that disappears on closer inspection. Such visualization mirages can lead us to see patterns or draw conclusions that don’t exist in our data. We analyze these quarrelsome entities and provide a testing strategy for dispelling them.
阅读可视化内容时,我们所看到的就是我们真正得到的吗? 可视化有很多方式可以误导我们,从而使它们看上去向我们展示了一些有趣的东西,这些东西在仔细检查后就会消失。 这种可视化的幻影可以使我们看到数据中不存在的模式或得出结论。 我们分析了这些争吵的实体,并提供了消除它们的测试策略。
介绍 (Intro)
The trained data visualization eye notices red flags that indicate that something misleading is going on. Dual axes that don’t quite match up. Misleading color ramps. Dubious sources. While learning how visualizations mislead is every bit as important as learning how they are created, even the studious can be deceived!
训练有素的数据可视化眼睛会注意到红旗,表明发生了误导性事件。 不完全匹配的双轴。 误导的色带。 可疑来源。 在学习可视化如何产生误导时,与学习如何创建可视化一样重要,即使是好学也可以被欺骗!
These dastardly deceptions need not be deviously devised either. While some visualizations are of course created by bad actors, most are not. Even designs crafted with the best of intentions yield all kinds of confusions and mistakes. An uncareful or careless analyst might hallucinate meaning where there isn’t any or jump to a conclusion that is only hazily supported.
这些卑鄙的欺骗也不需要被巧妙地设计出来。 虽然某些可视化当然是由不良参与者创建的,但大多数可视化不是。 即使是精心设计的设计也会产生各种混乱和错误。 粗心或粗心的分析师可能会产生幻觉,这意味着没有答案或得出结论只是模糊地支持。
What can we say about the humble bar chart below on the left? It appears that location B has about 50% more sales than location A. Is the store in location A underperforming? Given the magnitude of the difference, I’d bet your knee jerk answer would be yes.
我们可以说一下左侧下方的条形图吗? 看来位置B的销售额比位置A多50%。位置A的商店表现不佳吗? 考虑到差异的严重性,我敢打赌你的膝盖混蛋的回答是肯定的。
Many patterns can hide behind aggregated data. For example, a simple average might hide dirty data, irregular population sizes, or a whole host of other problems. Simple aggregations like our humble bar chart are the foundation of many analytics tools, with subsequent analyses often being built on top of these potentially shaky grounds.
许多模式可以隐藏在聚合数据的后面 。 例如,简单的平均值可能会隐藏脏数据,不规则的人口规模或其他许多问题。 像我们简陋的条形图这样的简单聚合是许多分析工具的基础,随后的分析通常建立在这些可能不稳定的基础之上。
What are we to do about these problems? Should we stop analyzing data visually? Throw out our computers? Perhaps we can form a theory that will help us build a method for automatically surfacing and catching these quarrelsome errors?
这些问题我们该怎么办? 我们应该停止视觉分析数据吗? 扔掉我们的计算机吗? 也许我们可以形成一种理论,以帮助我们建立一种自动显示并捕获这些争端错误的方法?
输入幻影 (Enter Mirages)
On the road to making a chart or visualization there are many steps and stages, each of which are liable to let error in. Consider a simplified model: an analyst decides how to curate data, how to wrangle it into a usable form, how to visually encode that data, and then finally actually how to read it. When the analyst makes a decision, they exercise agency and create an opportunity for error, which can cascade along this pipeline, creating illusory insights.
在制作图表或可视化的过程中,有许多步骤和阶段,每个阶段都容易出错。考虑一个简化的模型:分析师决定如何整理数据,如何将数据整理成可用的形式,如何对数据进行视觉编码,然后最终实际读取数据。 当分析师做出决定时,他们会发挥代理作用,并创造出错的机会,而错误的机会会沿着这条流水线级联,从而产生虚幻的见解。
Something as innocuous as defining the bins of a histogram can mask underlying data quality issues, which might in turn lead to incorrect inferences about a trend. Arbitrary choices about axis ordering in a radar chart can cause a reader to falsely believe one job candidate is good while another is lacking. Decisions about what type of crime actually counts as a crime can lead to maps that drive radically different impressions about the role of crime in a particular area.
定义直方图的bin之类的无害操作可能掩盖了潜在的数据质量问题 ,从而可能导致对趋势的错误推断。 雷达图上有关轴排序的任意选择可能导致读者错误地认为一个求职者是好的,而另一个则缺乏。 关于实际上将什么类型的犯罪视为犯罪的决定可能会导致地图产生对特定区域中犯罪角色的根本不同印象。
The first step in addressing a problem is often to name it, so we introduce a term for these errors: Visualization Mirages. We define them as
解决问题的第一步通常是为其命名,因此我们为这些错误引入一个术语:可视化幻影。 我们将它们定义为
“any visualization where the cursory reading of the visualization would appear to support a particular message arising from the data, but where a closer re-examination would remove or cast significant doubt on this support.”
任何可视化,其中可视化显示的粗读似乎都支持来自数据的特定消息,但是更仔细的重新检查将消除这种支持或对该支持产生重大怀疑。 ”
Mirages arise throughout visual analytics. They occur as the result of choices made about data. They come from design choices. They depend on what you are trying to do with the visualization. What may be misleading in the context of one task may not interfere with another. For instance, a poorly selected aspect ratio could produce a mirage for a viewer who wanted to know about the correlation in a scatterplot, but is unlikely to affect someone who just wants to find the biggest value.
视觉分析中出现了许多奇迹。 它们是由于对数据进行选择而产生的。 它们来自设计选择。 它们取决于您要如何处理可视化。 在一项任务中可能引起误解的内容可能不会干扰另一项任务。 例如,对于那些想了解散点图中的相关性,但不太可能影响只想找到最大价值的人,观看者选择的宽高比可能会产生幻影。
The errors that create mirages have both familiar and unfamiliar names: Drill-down Bias, Forgotten Population or Missing Dataset, Cherry Picking, Modifiable Areal Unit Problem, Non-sequitur Visualizations, and so many more. An annotated and expanded version of this list is included in the paper supplement. There is a sprawling universe of subtle and tricky ways that mirages can arise.
产生海市ages楼的错误既有熟悉的名称,又有不熟悉的名称: 向下钻取偏差 , 被遗忘的总体或缺失的数据集 ,Cherry采摘, 可修改的地域单位问题 , 非sequitur可视化等等。 此列表的带注释的扩展版本包含在论文补充中 。 幻影出现的范围是微妙而棘手的。
To make matters worse, there are few automated tools to help the reader or chart creator know that they haven’t deceived themselves in pursuit of insight.
更糟的是,几乎没有自动化工具可以帮助读者或图表制作者知道他们在追求洞察力方面并没有欺骗自己。
这些事情真的发生了吗? (Do these things really happen?)
Imagine you are curious about the trend of global energy usage over time. A natural way to address these questions would be to fire up Tableau and drop in the World Indicators dataset, which consists of vital world statistics from 2000 to 2012. The trend over time (a) shows that there was a sharp decrease in 2012! This would be great news for the environment, were it not illusory, as we see in (b) when checking the set of missing records.
想象一下,您对全球能源使用量随时间变化的趋势感到好奇。 解决这些问题的自然方法是启动Tableau并放下World Indicators数据集 ,该数据集包含2000年至2012年的重要世界统计数据。随着时间的推移(a),表明2012年急剧下降! 如果不是虚幻的话,这对于环境而言将是一个好消息,正如我们在(b)中检查缺失记录集时所看到的那样。
If we try to quash these data problems by switching the aggregation in our line chart from SUM to MEAN, we find that the opposite is true!! There was a sharp increase in 2012. Unfortunately this conclusion is another mirage. The only non-null entries for 2012 are OECD countries. These countries have much higher energy usage than other countries across all years (d).
如果我们尝试通过将折线图中的汇总从SUM切换到MEAN来缓解这些数据问题,则会发现相反的事实!! 2012年急剧增加。不幸的是,这一结论是另一个幻象。 2012年唯一的非空条目是经合组织国家。 这些年来,这些国家的能源使用量比其他国家高得多(d)。
Given these irregularities we can try removing 2012 from the data, and focus on the gradual upward trend in energy usage in the rest of the data. As we can see on the left, it appears that energy usage is tightly correlated with average life expectancy, perhaps more power means a happier life for everyone after all. Unfortunately this too is a mirage. The y-axis of this chart has been altered to make the trends appear similar, and obscures the fact that energy use is flat for most countries.
鉴于这些违规情况,我们可以尝试从数据中删除2012年,并关注其余数据中能源使用量的逐渐上升趋势。 正如我们在左侧看到的那样,能源使用似乎与平均预期寿命紧密相关,也许更高的功率毕竟意味着每个人的幸福生活。 不幸的是,这也是一个海市rage楼。 更改了此图表的y轴,以使趋势看起来相似,并且掩盖了大多数国家的能源使用量持平的事实。
Now of course, you’re probably saying:
当然,现在您可能会说:
但是我真的很聪明,我不会犯这种错误 (But I’m really smart, I wouldn’t make this type of mistake)
That’s great! Congrats on being smart. Unfortunately, even those with high data visualization literacy make mistakes. Visualizations are rhetorical devices that are easy to trust too deeply. Charting systems often give an air of credibility that they don’t necessarily warrant. It is often easier to trust your initial inferences and move on. Interactive visualizations with exploratory tools that help to might dispel a mirage are often only glanced at by casual readers. Sometimes you are just tired and miss something “obvious”.
那很棒! 恭喜你聪明。 不幸的是,即使那些具有较高数据可视化素养的人也会犯错。 可视化是易于深深信任的修辞手段 。 制图系统通常会给人一种不一定要保证的可信度。 相信最初的推论并继续前进通常会更容易。 具有探索性工具的交互式可视化工具有助于驱散海市rage楼,通常只有休闲读者才能看一眼 。 有时您只是累了而错过了一些“显而易见的”东西。
Some visualization problems are easy to detect, such as axes pointed in an un-intuitive or unconventional direction or a pie chart with more than a handful of wedges. This type of best practice knowledge isn’t always available, for instance, what if you are trying to use a novel type of visualization? (A xenographic perhaps?) There’d be nothing beyond your intuition to help guide you.
某些可视化问题很容易检测,例如指向非直觉或非常规方向的轴或带有多个楔形的饼图。 这种类型的最佳实践知识并不总是可用,例如,如果您尝试使用新颖的可视化类型怎么办? (也许是xenographic ?)除了您的直觉之外,没有什么可以帮助指导您。
Other, more terrifying, problems only arise for particular datasets when paired with particular charts. To address these we introduce a testing strategy (derived from Metamorphic Testing) that can identify some of this thorny class of errors, such as the aggregation masking unreliable inputs that we saw earlier with our humble bar chart.
其他更可怕的问题仅在与特定图表配对时才针对特定数据集出现。 为了解决这些问题,我们引入了一种测试策略(源自Metamorphic Testing ),该策略可以识别一些棘手的错误类别,例如聚合掩盖了我们之前在谦虚的条形图中看到的不可靠的输入。
Testing for errors is easy if you know the correct behavior of a system. Simply inspect the system and report your findings. In errors in the hinterlands of data and encoding we are left without such a compass. Instead, we try to find guidance by identifying symmetries across data changes.
如果您知道系统的正确行为,则测试错误很容易。 只需检查系统并报告您的发现。 在数据和编码腹地的错误中,我们没有指南针。 相反,我们尝试通过识别跨数据更改的对称性来找到指导。
The order in which you draw the dots in a scatterplot shouldn’t matter, right? Yet, depending on the dataset, it often can!!! This can erase data classes or cause false inferences. We test for this property by shuffling the order of the input data and then comparing the pixel-wise difference between the two images. If the difference is above a certain threshold we know that there may be a problem. This is the essence of our technique: for a particular dataset, execute a change that should have a predictable result (here no change), and compare the results.
在散点图中绘制点的顺序不重要,对吧? 但是,根据数据集,通常可以!!! 这可能会擦除数据类或导致错误的推断。 我们通过改组输入数据的顺序,然后比较两个图像之间的像素差异来测试此属性。 如果差异高于某个阈值,我们知道可能存在问题。 这是我们技术的本质:对于特定的数据集,执行应具有可预测结果(此处无变化)的更改,然后比较结果。
While it’s still in early development, we find that this approach can effectively catch a wide variety of visualization errors that fall in this intersection of matching encoding to data. These techniques can help surface errors in over-plotting, aggregation, missing aggregation, and a variety of other contexts. It remains an open challenge on how to effectively compute these errors (as their computation can be burdensome) as well as how to best describe these errors to the user.
尽管它仍处于早期开发阶段,但我们发现这种方法可以有效地捕获由于将编码与数据进行匹配而出现的各种可视化错误。 这些技术可以帮助在过度绘图,聚合,缺少聚合以及其他各种情况下出现表面错误。 如何有效地计算这些错误(因为它们的计算可能很麻烦)以及如何最好地向用户描述这些错误仍然是一个公开的挑战。
那在哪里离开我们? (Where does that leave us?)
Visualizations, and the people who create them, are prone to failure in subtle and difficult ways. We believe that visual analytics systems should do more to protect their users from themselves. One way these systems can do this is to surface visualization mirages to their users as part of the analytics process, which, hopefully will guide them towards safer and more effective analyses. Applying our metamorphic testing for visualization approach is just one tool in the visualization validation toolbox. The right interfaces to accomplish this goal is still unknown, although applying a metaphor of software linting seems promising. For more details check out our paper, take a look at the code repo for the project, or watch our CHI talk.
可视化及其创建人员很容易以微妙而困难的方式失败。 我们认为视觉分析系统应该做更多的事情来保护用户免受自身伤害。 这些系统可以做到这一点的一种方法是在分析过程中向用户展现可视化的幻影,这有望引导他们进行更安全,更有效的分析。 将我们的变质测试应用于可视化方法只是可视化验证工具箱中的一种工具。 尽管应用软件掉落的隐喻似乎 很有希望 ,但实现该目标的正确接口仍然未知。 有关更多详细信息,请查看我们的论文 ,查看该项目的代码存储库 ,或观看我们的CHI演讲 。
翻译自: https://medium.com/multiple-views-visualization-research-explained/surfacing-visualization-mirages-8d39e547e38c
奇迹网站可视化排行榜]
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/391462.shtml
如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!