奇迹网站可视化排行榜]_外观可视化奇迹

奇迹网站可视化排行榜]

When reading a visualization is what we see really what we get?

阅读可视化内容时,我们真正看到的是什么?

This post summarizes and accompanies our paper “Surfacing Visualization Mirages” that was presented at CHI 2020 with a best paper honorable mention. This post was written collaboratively by Andrew McNutt, Gordon Kindlmann, and Michael Correll.

这篇文章总结并伴随了我们 2020年CHI上 发表的 论文“ 堆焊可视化奇迹 ”,并 获得了最佳论文荣誉奖。 这篇文章是由 Andrew McNutt Gordon Kindlmann Michael Correll 合作撰写的

TL; DR (TL;DR)

When reading a visualization, is what we see really what we get? There are a lot of ways that visualizations can mislead us, such that they appear to show us something interesting that disappears on closer inspection. Such visualization mirages can lead us to see patterns or draw conclusions that don’t exist in our data. We analyze these quarrelsome entities and provide a testing strategy for dispelling them.

阅读可视化内容时,我们所看到的就是我们真正得到的吗? 可视化有很多方式可以误导我们,从而使它们看上去向我们展示了一些有趣的东西,这些东西在仔细检查后就会消失。 这种可视化的幻影可以使我们看到数据中不存在的模式或得出结论。 我们分析了这些争吵的实体,并提供了消除它们的测试策略。

介绍 (Intro)

The trained data visualization eye notices red flags that indicate that something misleading is going on. Dual axes that don’t quite match up. Misleading color ramps. Dubious sources. While learning how visualizations mislead is every bit as important as learning how they are created, even the studious can be deceived!

训练有素的数据可视化眼睛会注意到红旗,表明发生了误导性事件。 不完全匹配的双轴。 误导的色带。 可疑来源。 在学习可视化如何产生误导时,与学习如何创建可视化一样重要,即使是好学也可以被欺骗!

These dastardly deceptions need not be deviously devised either. While some visualizations are of course created by bad actors, most are not. Even designs crafted with the best of intentions yield all kinds of confusions and mistakes. An uncareful or careless analyst might hallucinate meaning where there isn’t any or jump to a conclusion that is only hazily supported.

这些卑鄙的欺骗也不需要被巧妙地设计出来。 虽然某些可视化当然是由不良参与者创建的,但大多数可视化不是。 即使是精心设计的设计也会产生各种混乱和错误。 粗心或粗心的分析师可能会产生幻觉,这意味着没有答案或得出结论只是模糊地支持。

What can we say about the humble bar chart below on the left? It appears that location B has about 50% more sales than location A. Is the store in location A underperforming? Given the magnitude of the difference, I’d bet your knee jerk answer would be yes.

我们可以说一下左侧下方的条形图吗? 看来位置B的销售额比位置A多50%。位置A的商店表现不佳吗? 考虑到差异的严重性,我敢打赌你的膝盖混蛋的回答是肯定的。

Many patterns can hide behind aggregated data. For example, a simple average might hide dirty data, irregular population sizes, or a whole host of other problems. Simple aggregations like our humble bar chart are the foundation of many analytics tools, with subsequent analyses often being built on top of these potentially shaky grounds.

许多模式可以隐藏在聚合数据的后面 。 例如,简单的平均值可能会隐藏脏数据,不规则的人口规模或其他许多问题。 像我们简陋的条形图这样的简单聚合是许多分析工具的基础,随后的分析通常建立在这些可能不稳定的基础之上。

What are we to do about these problems? Should we stop analyzing data visually? Throw out our computers? Perhaps we can form a theory that will help us build a method for automatically surfacing and catching these quarrelsome errors?

这些问题我们该怎么办? 我们应该停止视觉分析数据吗? 扔掉我们的计算机吗? 也许我们可以形成一种理论,以帮助我们建立一种自动显示并捕获这些争端错误的方法?

A flow chart describing the visual analytics process.
The chart-making process is full of moments of agency for the chart creator. What counts as data? What is an appropriate way to manipulate that data? How do I show this data? How do I go about understanding it? The answers to all of these questions can affect the readers ultimate takeaways.
图表制作过程充满了图表创建者的代理商活动。 什么算作数据? 什么是处理该数据的合适方法? 如何显示此数据? 我如何去了解它? 所有这些问题的答案都会影响读者的终极收获。

输入幻影 (Enter Mirages)

On the road to making a chart or visualization there are many steps and stages, each of which are liable to let error in. Consider a simplified model: an analyst decides how to curate data, how to wrangle it into a usable form, how to visually encode that data, and then finally actually how to read it. When the analyst makes a decision, they exercise agency and create an opportunity for error, which can cascade along this pipeline, creating illusory insights.

在制作图表或可视化的过程中,有许多步骤和阶段,每个阶段都容易出错。考虑一个简化的模型:分析师决定如何整理数据,如何将数据整理成可用的形式,如何对数据进行视觉编码,然后最终实际读取数据。 当分析师做出决定时,他们会发挥代理作用,并创造出错的机会,而错误的机会会沿着这条流水线级联,从而产生虚幻的见解。

Something as innocuous as defining the bins of a histogram can mask underlying data quality issues, which might in turn lead to incorrect inferences about a trend. Arbitrary choices about axis ordering in a radar chart can cause a reader to falsely believe one job candidate is good while another is lacking. Decisions about what type of crime actually counts as a crime can lead to maps that drive radically different impressions about the role of crime in a particular area.

定义直方图的bin之类的无害操作可能掩盖了潜在的数据质量问题 ,从而可能导致对趋势的错误推断。 雷达图上有关轴排序的任意选择可能导致读者错误地认为一个求职者是好的,而另一个则缺乏。 关于实际上将什么类型的犯罪视为犯罪的决定可能会导致地图产生对特定区域中犯罪角色的根本不同印象。

Image for post
While charts tend to feel trust worthy, the harmless-seeming choices that create them can cause all sorts of hallucinations.
虽然图表倾向于值得信任,但创建图表的无害选择可能会引起各种幻觉。

The first step in addressing a problem is often to name it, so we introduce a term for these errors: Visualization Mirages. We define them as

解决问题的第一步通常是为其命名,因此我们为这些错误引入一个术语:可视化幻影。 我们将它们定义为

any visualization where the cursory reading of the visualization would appear to support a particular message arising from the data, but where a closer re-examination would remove or cast significant doubt on this support.

任何可视化,其中可视化显示的粗读似乎都支持来自数据的特定消息,但是更仔细的重新检查将消除这种支持或对该支持产生重大怀疑。

Mirages arise throughout visual analytics. They occur as the result of choices made about data. They come from design choices. They depend on what you are trying to do with the visualization. What may be misleading in the context of one task may not interfere with another. For instance, a poorly selected aspect ratio could produce a mirage for a viewer who wanted to know about the correlation in a scatterplot, but is unlikely to affect someone who just wants to find the biggest value.

视觉分析中出现了许多奇迹。 它们是由于对数据进行选择而产生的。 它们来自设计选择。 它们取决于您要如何处理可视化。 在一项任务中可能引起误解的内容可能不会干扰另一项任务。 例如,对于那些想了解散点图中的相关性,但不太可能影响只想找到最大价值的人,观看者选择的宽高比可能会产生幻影。

A man crawls across a desert following a a sign labeled “VA process” towards a mirage that is labeled “insights”
We all thirst for insight in visual analytics (or anywhere else). This desire can cause us to overlook important details or forget best practices.
我们都渴望在可视化分析(或其他任何方面)上获得见识。 这种渴望会导致我们忽略重要的细节或忘记最佳实践。

The errors that create mirages have both familiar and unfamiliar names: Drill-down Bias, Forgotten Population or Missing Dataset, Cherry Picking, Modifiable Areal Unit Problem, Non-sequitur Visualizations, and so many more. An annotated and expanded version of this list is included in the paper supplement. There is a sprawling universe of subtle and tricky ways that mirages can arise.

产生海市ages楼的错误既有熟悉的名称,又有不熟悉的名称: 向下钻取偏差 , 被遗忘的总体或缺失的数据集 ,Cherry采摘, 可修改的地域单位问题 , 非sequitur可视化等等。 此列表的带注释的扩展版本包含在论文补充中 。 幻影出现的范围是微妙而棘手的。

To make matters worse, there are few automated tools to help the reader or chart creator know that they haven’t deceived themselves in pursuit of insight.

更糟的是,几乎没有自动化工具可以帮助读者或图表制作者知道他们在追求洞察力方面并没有欺骗自己。

这些事情真的发生了吗? (Do these things really happen?)

Imagine you are curious about the trend of global energy usage over time. A natural way to address these questions would be to fire up Tableau and drop in the World Indicators dataset, which consists of vital world statistics from 2000 to 2012. The trend over time (a) shows that there was a sharp decrease in 2012! This would be great news for the environment, were it not illusory, as we see in (b) when checking the set of missing records.

想象一下,您对全球能源使用量随时间变化的趋势感到好奇。 解决这些问题的自然方法是启动Tableau并放下World Indicators数据集 ,该数据集包含2000年至2012年的重要世界统计数据。随着时间的推移(a),表明2012年急剧下降! 如果不是虚幻的话,这对于环境而言将是一个好消息,正如我们在(b)中检查缺失记录集时所看到的那样。

A line chart with the caption energy down? A bar chart with the caption Count of Nulls. A line chart with energy up?

If we try to quash these data problems by switching the aggregation in our line chart from SUM to MEAN, we find that the opposite is true!! There was a sharp increase in 2012. Unfortunately this conclusion is another mirage. The only non-null entries for 2012 are OECD countries. These countries have much higher energy usage than other countries across all years (d).

如果我们尝试通过将折线图中的汇总从SUM切换到MEAN来缓解这些数据问题,则会发现相反的事实!! 2012年急剧增加。不幸的是,这一结论是另一个幻象。 2012年唯一的非空条目是经合组织国家。 这些年来,这些国家的能源使用量比其他国家高得多(d)。

Two line charts. Left one shows Energy Usage vs Life Expectancy over time, the right one show energy use over time

Given these irregularities we can try removing 2012 from the data, and focus on the gradual upward trend in energy usage in the rest of the data. As we can see on the left, it appears that energy usage is tightly correlated with average life expectancy, perhaps more power means a happier life for everyone after all. Unfortunately this too is a mirage. The y-axis of this chart has been altered to make the trends appear similar, and obscures the fact that energy use is flat for most countries.

鉴于这些违规情况,我们可以尝试从数据中删除2012年,并关注其余数据中能源使用量的逐渐上升趋势。 正如我们在左侧看到的那样,能源使用似乎与平均预期寿命紧密相关,也许更高的功率毕竟意味着每个人的幸福生活。 不幸的是,这也是一个海市rage楼。 更改了此图表的y轴,以使趋势看起来相似,并且掩盖了大多数国家的能源使用量持平的事实。

Now of course, you’re probably saying:

当然,现在您可能会说:

但是我真的很聪明,我不会犯这种错误 (But I’m really smart, I wouldn’t make this type of mistake)

That’s great! Congrats on being smart. Unfortunately, even those with high data visualization literacy make mistakes. Visualizations are rhetorical devices that are easy to trust too deeply. Charting systems often give an air of credibility that they don’t necessarily warrant. It is often easier to trust your initial inferences and move on. Interactive visualizations with exploratory tools that help to might dispel a mirage are often only glanced at by casual readers. Sometimes you are just tired and miss something “obvious”.

那很棒! 恭喜你聪明。 不幸的是,即使那些具有较高数据可视化素养的人也会犯错。 可视化是易于深深信任的修辞手段 。 制图系统通常会给人一种不一定要保证的可信度。 相信最初的推论并继续前进通常会更容易。 具有探索性工具的交互式可视化工具有助于驱散海市rage楼,通常只有休闲读者才能看一眼 。 有时您只是累了而错过了一些“显而易见的”东西。

A chart showing the gun deaths in florida over time
This infamous chart appears on first glance to be saying that ‘Stand Your Ground’ decreased gun deaths, but on closer inspection it shows the opposite! Terrifying! (The author of this chart wasn’t actually trying to confuse anyone, they were just trying to explore a new design language)
这张臭名昭著的图表乍一看似乎是在说“站起来”减少了枪支死亡,但仔细检查却发现情况恰恰相反! 太恐怖了! (此图表的作者实际上并没有试图使任何人困惑,他们只是在尝试探索一种新的设计语言)

Some visualization problems are easy to detect, such as axes pointed in an un-intuitive or unconventional direction or a pie chart with more than a handful of wedges. This type of best practice knowledge isn’t always available, for instance, what if you are trying to use a novel type of visualization? (A xenographic perhaps?) There’d be nothing beyond your intuition to help guide you.

某些可视化问题很容易检测,例如指向非直觉或非常规方向的轴或带有多个楔形的饼图。 这种类型的最佳实践知识并不总是可用,例如,如果您尝试使用新颖的可视化类型怎么办? (也许是xenographic ?)除了您的直觉之外,没有什么可以帮助指导您。

Other, more terrifying, problems only arise for particular datasets when paired with particular charts. To address these we introduce a testing strategy (derived from Metamorphic Testing) that can identify some of this thorny class of errors, such as the aggregation masking unreliable inputs that we saw earlier with our humble bar chart.

其他更可怕的问题仅在与特定图表配对时才针对特定数据集出现。 为了解决这些问题,我们引入了一种测试策略(源自Metamorphic Testing ),该策略可以识别一些棘手的错误类别,例如聚合掩盖了我们之前在谦虚的条形图中看到的不可靠的输入。

Testing for errors is easy if you know the correct behavior of a system. Simply inspect the system and report your findings. In errors in the hinterlands of data and encoding we are left without such a compass. Instead, we try to find guidance by identifying symmetries across data changes.

如果您知道系统的正确行为,则测试错误很容易。 只需检查系统并报告您的发现。 在数据和编码腹地的错误中,我们没有指南针。 相反,我们尝试通过识别跨数据更改的对称性来找到指导。

The order in which you draw the dots in a scatterplot shouldn’t matter, right? Yet, depending on the dataset, it often can!!! This can erase data classes or cause false inferences. We test for this property by shuffling the order of the input data and then comparing the pixel-wise difference between the two images. If the difference is above a certain threshold we know that there may be a problem. This is the essence of our technique: for a particular dataset, execute a change that should have a predictable result (here no change), and compare the results.

在散点图中绘制点的顺序不重要,对吧? 但是,根据数据集,通常可以!!! 这可能会擦除数据类或导致错误的推断。 我们通过改组输入数据的顺序,然后比较两个图像之间的像素差异来测试此属性。 如果差异高于某个阈值,我们知道可能存在问题。 这是我们技术的本质:对于特定的数据集,执行应具有可预测结果(此处无变化)的更改,然后比较结果。

A series of 3 scatterplots. The first two show the same data but appear different. The third image highlights the differences
A simple scatterplot can hide the distributions it displays through draw order. This problem won’t affect every dataset, but here it hides the prevalence of the Americas in the middle of the distribution.
一个简单的散点图可以通过绘制顺序隐藏其显示的分布。 这个问题不会影响每个数据集,但是在这里它掩盖了美洲在分布中间的普遍性。

While it’s still in early development, we find that this approach can effectively catch a wide variety of visualization errors that fall in this intersection of matching encoding to data. These techniques can help surface errors in over-plotting, aggregation, missing aggregation, and a variety of other contexts. It remains an open challenge on how to effectively compute these errors (as their computation can be burdensome) as well as how to best describe these errors to the user.

尽管它仍处于早期开发阶段,但我们发现这种方法可以有效地捕获由于将编码与数据进行匹配而出现的各种可视化错误。 这些技术可以帮助在过度绘图,聚合,缺少聚合以及其他各种情况下出现表面错误。 如何有效地计算这些错误(因为它们的计算可能很麻烦)以及如何最好地向用户描述这些错误仍然是一个公开的挑战。

那在哪里离开我们? (Where does that leave us?)

Visualizations, and the people who create them, are prone to failure in subtle and difficult ways. We believe that visual analytics systems should do more to protect their users from themselves. One way these systems can do this is to surface visualization mirages to their users as part of the analytics process, which, hopefully will guide them towards safer and more effective analyses. Applying our metamorphic testing for visualization approach is just one tool in the visualization validation toolbox. The right interfaces to accomplish this goal is still unknown, although applying a metaphor of software linting seems promising. For more details check out our paper, take a look at the code repo for the project, or watch our CHI talk.

可视化及其创建人员很容易以微妙而困难的方式失败。 我们认为视觉分析系统应该做更多的事情来保护用户免受自身伤害。 这些系统可以做到这一点的一种方法是在分析过程中向用户展现可视化的幻影,这有望引导他们进行更安全,更有效的分析。 将我们的变质测试应用于可视化方法只是可视化验证工具箱中的一种工具。 尽管应用软件掉落的隐喻似乎 很有希望 ,但实现该目标的正确接口仍然未知。 有关更多详细信息,请查看我们的论文 ,查看该项目的代码存储库 ,或观看我们的CHI演讲 。

翻译自: https://medium.com/multiple-views-visualization-research-explained/surfacing-visualization-mirages-8d39e547e38c

奇迹网站可视化排行榜]

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/391462.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Oracle自动性能统计

Oracle自动性能统计 高效诊断性能问题,需要提供完整可用的统计信息,好比医生给病人看病的望闻问切,才能够正确的确诊,然后再开出相应的药方。Oracle数据库为系统、会话以及单独的sql语句生成多种类型的累积统计信息。本文主要描述…

numpy2

1、通用函数,是一种在ndarray数据中进行逐元素操作的函数。某些函数接受一个或多个标量数值,并产生一个或多个标量结果,通用函数就是对这些函数的封装。 1、常用的一元通用函数有:abs\fabs  sqrt   square  exp  log\log2…

Apache Prefork、Worker和Event三种MPM简单分析

(1) Prefork MPM (优点) :使用多个子进程,每个子进程只有一个线程来处理一个 http 连接,不用担心线程安全问题缺点:内存消耗大,不擅长处理高并发环境,使用keep-alive长连接时要等到超…

grasshopper_如何使用Google的Grasshopper编码应用程序来学习手机上的编码基础知识...

grasshopper什么是蚱hopper? (What is Grasshopper?) Grasshopper is an interactive education app for learning about coding. It began at Google as an experimental project created by a group called Area 120. Grasshopper是一个用于学习编码的交互式教育…

机器学习 量子_量子机器学习:神经网络学习

机器学习 量子My last articles tackled Bayes nets on quantum computers (read it here!), and k-means clustering, our first steps into the weird and wonderful world of quantum machine learning.我的最后一篇文章讨论了量子计算机上的贝叶斯网络( 在这里阅读&#xf…

leetcode 179. 最大数(排序)

给定一组非负整数 nums,重新排列每个数的顺序(每个数不可拆分)使之组成一个最大的整数。 注意:输出结果可能非常大,所以你需要返回一个字符串而不是整数。 示例 1: 输入:nums [10,2] 输出&a…

test3

test3 转载于:https://www.cnblogs.com/Forever77/p/11441068.html

linux渗透测试_渗透测试:选择正确的(Linux)工具栈来修复损坏的IT安全性

linux渗透测试Got IT infrastructure? Do you know how secure it is? The answer will probably hurt, but this is the kind of bad news you’re better off getting sooner rather than later.有IT基础架构吗? 你知道它有多安全吗? 答案可能会很痛…

BZOJ 1176: [Balkan2007]Mokia

一道CDQ分治的模板题,然而我De了一上午Bug...... 按时间分成左右两半,按x坐标排序然后把y坐标丢到树状数组里,扫一遍遇到左边的就add,遇到右边的query 几个弱智出了bug的点, 一是先分了左右两半再排序,保证的是这次的左…

深入理解InnoDB(1)—行的存储结构

1.InnoDB页的简介 页(Page)是 Innodb 存储引擎用于管理数据的最小磁盘单位。常见的页类型有数据页、Undo 页、系统页、事务数据页等 2.InnoDB行的存储格式 我们插入MySQL的记录在InnoDB中可能以4中行格式存储,分别是Compact、Redundant、D…

做嵌入式的必须学Android吗

做嵌入式的必须学Android吗Android方向适合哪些人呢?适合那些已经在自己领域有了一定的工作经验的人,适合作为自己的拓展,适合提升自己的能力,譬如说已经做三年Linux驱动,就可以尝试拓展去做Android驱动首先从技术角度…

test4

test4 转载于:https://www.cnblogs.com/Forever77/p/11441980.html

boltzmann_推荐系统系列第7部分:用于协同过滤的Boltzmann机器的3个变体

boltzmannRecSys系列 (RecSys Series) Update: This article is part of a series where I explore recommendation systems in academia and industry. Check out the full series: Part 1, Part 2, Part 3, Part 4, Part 5, Part 6, and Part 7.更新: 本文是我探索…

.net 初学者_在此初学者课程中学习使用TensorFlow 2.0开发神经网络

.net 初学者Learn how to use TensorFlow 2.0 in this full video course from Tech with Tim. This course will show you how to create neural networks with Python and TensorFlow 2.0.在Tech与Tim的完整视频课程中,学习如何使用TensorFlow 2.0。 本课程将向您…

AndroidStudio怎样导入library项目开源库 - 转

https://jingyan.baidu.com/article/1974b2898917aff4b1f77415.html转载于:https://www.cnblogs.com/EasyLive2006/p/7477719.html

深入理解InnoDB(2)—页的存储结构

1. 记录头信息 上一篇博客说到每行记录都会有记录头信息,用来记录每一行的一些属性 Compact行记录的记录头信息为例 1.1 delete_mask 这个属性标记着当前记录是否被删除,占用1个二进制位,值为0的时候代表记录并没有被删除,为1的…

PHP中的命名空间

1. PHP中的命名空间是什么? 官方解释在此: 命名空间概述 命名空间用一句话说,就是:把 类、函数、变量 等放到逻辑子文件夹中去,以避免命名冲突。 注:命名空间跟实际代码文件在文件系统中的路径没有任何关系…

pandas 入门

pandas简介:pandas包含的数据结构和数据处理工具的设计使得利用进行数据清洗和数据分析非常快捷;与numpy的区别,pandas用来处理表格型或异质型数据的,而numpy更适合处理同质型的数值类数据。 1、Series简介 1、Series是一种一维的…

传智播客软件测试第一期_播客:冒险如何推动一位软件工程师的职业发展

传智播客软件测试第一期On this weeks episode of the freeCodeCamp podcast, Abbey chats with developer and wearer of many hats Princiya about how she changed careers, moved to Berlin, and worked her way up to a lead role.在本周的freeCodeCamp播客节目中&#xf…

爬虫神经网络_股市筛选和分析:在投资中使用网络爬虫,神经网络和回归分析...

爬虫神经网络与AI交易 (Trading with AI) Stock markets tend to react very quickly to a variety of factors such as news, earnings reports, etc. While it may be prudent to develop trading strategies based on fundamental data, the rapid changes in the stock mar…