数据可视化机器学习工具在线_为什么您不能跳过学习数据可视化

数据可视化机器学习工具在线

重点 (Top highlight)

There’s no scarcity of posts online about ‘fancy’ data topics like data modelling and data engineering. But I’ve noticed their cousin, data visualization, barely gets the same amount of attention. Among data practitioners in my field, I find there is solid consensus that data viz is an important skill that’s worth devoting time to learn. However, we somehow spend more time learning complex models over figuring out why pie charts are no-nos.

在线上没有关于“花式”数据主题(如数据建模和数据工程)的帖子。 但是我注意到他们的表弟,数据可视化,几乎没有受到同样的关注。 在我所在领域的数据从业者中,我发现有一个牢固的共识,即数据可视化是一项重要的技能,值得花时间学习。 但是,我们花了更多的时间来学习复杂的模型,而不是弄清楚为什么饼图不行。

When I was just starting out in my career, the hype was really in modelling. I considered making graphs and visualizations ‘chores’, and I thought that the depth of my fancy data science knowledge was the greatest determinant of how much value I could bring.

当我刚开始我的职业生涯时,炒作真的是在建模。 我考虑过使图表和可视化成为“琐事”,并且我认为我对数据科学知识的深度决定了我可以带来多少价值。

As I developed the data viz aspect of my skill set, I picked up valuable lessons that now influence how I approach everything that I do. These lessons have proven extremely useful for me in my career journey, and I’d like to build the case here for why data viz is a core tool in any data person’s skill set.

在开发技能的数据可视化方面时,我吸取了宝贵的经验教训,这些经验教训现在影响着我处理工作方式的方式。 在我的职业生涯中,这些课程对我来说非常有用,我想在这里举例说明为什么数据可视化是任何数据人员技能中的核心工具。

学习1:如果您希望人们做正确的事,则必须使其变得容易 (Learning 1: If you want people to do the right thing, you have to make it easy)

A common frustration I hear from data people is that business stakeholders often don’t seem to make decisions that align with what data analysts have found to be optimal. This leads to a lose-lose scenario where stakeholders don’t get the results they’re looking for, while analysts get frustrated with ‘wasted work.’ I was also once stuck in the ‘if only stakeholders listened’ mentality.

我从数据人员那里听到的一个普遍沮丧是,业务利益相关者似乎常常没有做出与数据分析师认为最佳的决策一致的决策。 这导致了失败的情况,即利益相关者没有得到他们想要的结果,而分析师则对“浪费的工作”感到沮丧。 我也曾经陷入“如果只有利益相关者倾听”的心态。

Image for post
Darkhorseanalytics.Darkhorseanalytics 。

It was working with User Experience (UX) Product Designers (and surprisingly, not a data seminar) that brought me to rethink this mindset. They brought attention to things I would have called ‘trivial’ before, critiquing how websites and applications were built: ‘this banner is too big’, ‘this button should be colored blue’, etc — and for good reason. They told me that ‘If we don’t do X, the user will have trouble doing what we want them to do.’

它与用户体验(UX)产品设计师(而且令人惊讶的是,不是数据研讨会)合作,使我重新考虑了这种思维方式。 他们引起了我以前所谓的“琐碎”的注意,并批评了网站和应用程序的构建方式:“此横幅太大”,“此按钮应被涂成蓝色”等,这是有充分理由的。 他们告诉我, “如果我们不做X,用户将很难做我们想要他们做的事情。”

Coming back to the context of data, this insight is exactly why I would argue that data visualization is powerful. Well thought-out visuals, rather than ugly charts, most clearly explain to stakeholders why a data-backed recommendation will be most beneficial. After all, the main role of a data professional is not just to churn out models and analysis, but also to inspire data-driven action.

回到数据上下文,这种见解正是我认为数据可视化功能强大的原因。 经过深思熟虑的视觉效果,而不是难看的图表,可以最清晰地向利益相关者解释为什么数据支持的建议将最有益。 毕竟,数据专业人员的主要作用不仅在于建立模型和分析,还在于激发数据驱动的行动。

As a data professional (or any profession really), maybe we should think twice the next time ‘slides are a chore’ crosses our minds.

作为数据专业人士(或者实际上是任何专业),也许我们下次应该想到“幻灯片是一件琐事”时就要三思而行。

学习2:一张图片值一千字一分钟。 (Learning 2: A picture is worth a thousand words AND a thousand minutes.)

Image for post
tjpalanca.comtjpalanca.com

Full disclosure — I got irritated whenever people asked me to edit what seemed like irrelevant details in presentations. ‘Change the y-axis label from revenue to Revenue’, ‘Make this category green instead of red’, and ‘Add another graph,’ were common phrases I dealt with in my everyday work. I guess I did not like hearing them because I thought they’d mean that I’d have to re-render a report and waste time on a task I didn’t care about.

完全公开-每当有人要求我编辑演示文稿中似乎无关紧要的细节时,我都会很生气。 我在日常工作中经常使用“将y轴标签从“收入”更改为“收入””,“将此类别设置为绿色而不是红色”和“添加其他图表”。 我想我不喜欢听到他们的声音,因为我认为他们的意思是我必须重新提交报告,并在我不关心的任务上浪费时间。

This was admittedly ironic for me, since when doing research, I [hated] reading papers that were written as if it was the author’s goal to sound as complex as possible. I’d have to spend hours dissecting a set of paragraphs and equations in order to comprehend a piece of insight.

诚然,这对我具有讽刺意味,因为在进行研究时,我[讨厌]阅读所写的论文,好像作者的目标是听起来尽可能的复杂。 为了理解这一点,我不得不花几个小时来剖析一组段落和等式。

On the other hand, what I find most enjoyable and engaging to read are papers with extensive visualisations. In those papers, it was clear that the author took time and effort to create data visuals to aid a reader in understanding the point he or she wanted to get across.

另一方面,我觉得最有趣,最吸引人的是那些具有广泛可视化效果的论文。 在这些论文中,很明显,作者花了时间和精力来创建数据视觉效果,以帮助读者理解他或她想传达的观点。

Even while trying to understand complex topics in other mediums, I can’t count how many hours I’ve saved thanks to Youtube channels like 3Blue1Brown or Kurzgesagt that pair rich explanations with engaging visuals to illustrate their point.

即使尝试理解其他媒体中的复杂主题,由于YouTube频道(如3Blue1Brown或Kurzgesagt)将丰富的说明与引人入胜的视觉效果结合起来,我无法节省多少时间。

演示地址

3Blue1Brown on Neural Networks神经网络上的3Blue1Brown

These materials take time to make, and in analytics, making visuals can sometimes appear as optional. However, that 10 minutes you spent making a visual is 5–10 minutes saved for EACH individual trying to understand your point. This especially gets important in a fast-moving business, where decisions have to be made on the fly and a simple visual would have helped push through blockers to get to the action.

这些材料制作需要花费时间,在分析中,制作视觉效果有时可能是可选的。 但是,对于每一个试图理解您的观点的人来说,您花10分钟的时间进行视觉处理可以节省5-10分钟。 在快速发展的业务中,这尤其重要,因为该业务必须即时做出决策,而简单的视觉效果将有助于推动阻止者采取行动。

It can’t be overstated how the effort to make good data visualisations pays in the long run by influencing organizations and contributing to the efficiency of the decision-making process.

从长远来看,通过影响组织并为决策流程的效率做出贡献,使良好的数据可视化所付出的努力是如何付出的,这是不夸张的。

学习3:解决问题并不一定是一项艰巨的任务! (Learning 3: Problem solving does not need to be a solitary undertaking!)

I found this tweet by Mat Velloso very funny and relatable:

我发现Mat Velloso的这条推文非常有趣且相关:

演示地址

I assume the reason this happens is because companies find it difficult to measure the level of complexity of the problems they’re facing and “AI” sounds like a good catch-all solution. Even as a data practitioner, I’ve also been guilty of this, falling into analysis-paralysis and trying every single model, hoping one will magically solve everything. This is a big waste of the analyst’s (and the stakeholder’s) time.

我认为发生这种情况的原因是,公司发现难以衡量所面临问题的复杂程度,而“ AI”听起来像是一个万能的解决方案。 即使作为一名数据从业者,我也对此感到内,,陷入了分析瘫痪并尝试每个模型,希望一个模型能神奇地解决所有问题。 这浪费了分析师(和利益相关者)的时间。

It’s funny and all, but it goes to show that we can all still work on how we collectively understand complex data problems.

一切都很好笑,但这表明我们仍然可以共同致力于理解复杂数据问题。

Notice the number of reactions in this post in our Slack Channels (I blurred some details):

请注意我们的Slack频道中此帖子的React数量(我模糊了一些细节):

Image for post

On the other hand, check out the reactions (and replies!) on this other post with a data viz attached:

另一方面,请查看此帖子的React(和回复!),并附加数据:

Image for post

The posts show how people are happy to share thoughts and insights, provided they can relate and easily understand the context.

帖子显示了人们如何乐于分享思想和见解,只要他们能够联系在一起并 轻松了解上下文。

Modelling is indeed useful when we are sure that it is the most efficient solution to address our problem. However, the beauty about data viz is the communication piece that allows us to make problem solving a collaborative experience. After all, there is no scarcity of ideas from subject-matter experts whose only barrier is the technical data aspect. Imagine, if data viz made the problem scope clearer for other people, those who see our visuals can start ideating solutions on a business problem and the ripple effect amplifies. Suddenly from one person making a model (you), other people are already giving you ideas to further improve your work (and possibly even a solution that doesn’t need a model!)

当我们确定建模是解决我们的问题的最有效解决方案时,建模确实有用。 但是,关于数据即可视化的美是使我们能够解决协作体验问题的沟通工具。 毕竟,主题专家们并不缺乏想法,他们的唯一障碍是技术数据方面。 想象一下,如果数据可视化使其他人更清楚地了解问题的范围,那么看到我们的视觉效果的人就可以开始就业务问题提出解决方案,并且连锁React会不断扩大。 突然间,一个人(您)在制作模型,其他人已经在为您提供构想,以进一步改进您的工作(甚至可能不需要模型的解决方案!)

学习4:有一个时间和地点,一切都可视 (Learning 4: There is a time and place AND VISUAL for everything)

演示地址

Early in my career, my development goals were bullet points of technical topics that I wanted to learn and apply: recommendation systems, image processing, bayesian modelling, etc. So of course, I hopped on the first chance to apply and try out these techniques, thinking that these will generate the greatest amount of value with my limited amount of time.

在我职业生涯的早期,我的发展目标是我想学习和应用的技术主题的要点:推荐系统,图像处理,贝叶斯建模等。因此,我当然希望有机会应用和尝试这些技术,认为这些将在有限的时间内产生最大的价值。

Most of the time, however, a visual can offer the best bang-for-buck. In the model development process after all, there is always the exploration piece which usually involves data viz. In this step, we can already get a lot of insights from different visuals:

但是,大多数情况下,视觉效果可以提供最佳的性价比。 毕竟,在模型开发过程中,总是存在通常涉及数据的探索部分。 在这一步中,我们已经可以从不同的视觉效果中获得很多见解:

  • Dodged Bar Graph — Which brand of milk tea sold the most?

    闪避条形图—哪个品牌的奶茶销量最高?
  • Stacked Bar Graph — Which milk tea branch contributes the most to revenue?

    堆积条形图—哪个奶茶分支对收入的贡献最大?
  • Line Graph — Did average cost of milk tea decrease/increase?

    线形图—奶茶的平均成本是否降低/增加了?
  • Scatter Plot — How is age related to number of milk tea purchased?

    散点图—年龄与购买的奶茶数量有何关系?
  • Density Plot — What is the distribution of the age of milk tea drinkers?

    密度图-喝奶茶的年龄分布如何?
  • Tables — What is the total, average, mean, and median of milk tea purchases per customer across all months?

    表格-在每个月中,每位客户购买奶茶的总数,平均值,平均值和中位数是多少?
  • Pie Chart — No.

    饼图-不
Image for post
The Extreme Presentation Method by Dr. Abela的极端演示方法重现视觉

Data viz helps us dissect problems into comprehensible pieces, and potentially equips us to answer business questions without the need to apply complex methodologies. Going deep into visualisation will force you to ask “Are we looking at the right things?” and eventually — “Are we being led to the best strategies?”

数据可视化帮助我们将问题分解为可理解的部分,并有可能使我们无需使用复杂的方法即可回答业务问题。 深入可视化将迫使您问“我们在看正确的东西吗?” 最终-“我们被引导到最佳策略了吗?”

Data Viz是一种同时培养软技能和硬技能的实践 (Data Viz is a practice that cultivates both soft and hard skills at the same time)

  • Hard Skills: Data Wrangling (How do we transform data into its pre-graph form?), Grammar of Graphics (How are visualizations constructed?)

    硬技能:数据整理(如何将数据转换为图形形式), 图形语法 (如何构建可视化?)

  • Soft Skills: Communication (What visual will effectively drive home a point?), Critical Thinking (What visual will best convey a good strategy?)

    软技能:交流(哪种视觉方法可以有效地传达观点?),批判性思维(哪种视觉方法可以最好地传达一种良好的策略?)

From these experiences, I believe learning the principles behind data visualisation is beneficial for any data practitioner, and possibly even for people from other professions as well. Data viz is how we can communicate the complexity of data to the visual learners of the world (65% of the population, according to Google). We cannot skip learning this extremely useful and practical skill, and I’ll wait for the day when pie charts only exist in the form of memes.

从这些经验中,我相信学习数据可视化背后的原理对任何数据从业者都是有益的,甚至对于其他专业人士也可能是有益的。 数据即是我们如何将数据的复杂性传达给世界范围内的视觉学习者(根据Google的数据,占总人口的65%)。 我们不能跳过学习这一极其有用和实用的技能,而我将等待饼图仅以模因形式存在的那一天。

演示地址

Geckoboard on Pie ChartsGeckoboard在饼图上的短短一分钟视频

This post is mostly a compilation of opinions I formed in my data career. I’m super open to thoughts and comments from fellow practitioners and aspiring ones alike! Please feel free to shoot an email to nfrimando@gmail.com or connect with me via LinkedIn. Shout out to Fernandina Ko who helped me edit this piece, Christian San Jose for inviting me to the UX Fundamentals Class, and TJ Palanca for the minimalist map visual!

这篇文章大部分是我在数据职业生涯中形成的观点的汇编。 我非常欢迎同修和有抱负的人的想法和评论! 请随时发送电子邮件至nfrimando@gmail.com或通过LinkedIn与我联系。 向Fernandina Ko大喊大叫 ,后者帮助我编辑了这篇文章, Christian San Jose邀请我参加UX基础知识课程,向TJ Palanca提供了极简地图视觉效果!

翻译自: https://medium.com/swlh/why-you-cant-skip-learning-data-visualization-6314896ccdc0

数据可视化机器学习工具在线

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389844.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

python中nlp的库_用于nlp的python中的网站数据清理

python中nlp的库The most important step of any data-driven project is obtaining quality data. Without these preprocessing steps, the results of a project can easily be biased or completely misunderstood. Here, we will focus on cleaning data that is composed…

一张图看懂云栖大会·上海峰会重磅产品发布

2018云栖大会上海峰会上,阿里云重磅发布一批产品并宣布了新一轮的价格调整,再次用科技普惠广大开发者和用户,详情见长图。 了解更多产品请戳:https://yunqi.aliyun.com/2018/shanghai/product?spm5176.8142029.759399.2.a7236d3e…

怎么看另一个电脑端口是否通_谁一个人睡觉另一个看看夫妻的睡眠习惯

怎么看另一个电脑端口是否通In 2014, FiveThirtyEight took a survey of about 1057 respondents to get a look at the (literal) sleeping habits of the American public beyond media portrayal. Some interesting notices: first, that about 45% of all couples sleep to…

Java基础之Collection和Map

List:实现了collection接口,list可以重复,有顺序 实现方式:3种,分别为:ArrayList,LinkedList,Vector。 三者的比较: ArrayList底层是一个动态数组,数组是使用…

20155320《网络对抗》Exp4 恶意代码分析

20155320《网络对抗》Exp4 恶意代码分析 【系统运行监控】 使用schtasks指令监控系统运行 首先在C盘目录下建立一个netstatlog.bat文件(由于是系统盘,所以从别的盘建一个然后拷过去),用来将记录的联网结果格式化输出到netstatlog.…

tableau 自定义省份_在Tableau中使用自定义图像映射

tableau 自定义省份We have been reading about all the ways to make our vizzes in Tableau with more creativity and appeal. During my weekly practice for creating viz as part of makeovermonday2020 community, I came across geographical data which in way requir…

2055. 蜡烛之间的盘子

2055. 蜡烛之间的盘子 给你一个长桌子,桌子上盘子和蜡烛排成一列。给你一个下标从 0 开始的字符串 s ,它只包含字符 ‘’ 和 ‘|’ ,其中 ’ 表示一个 盘子 ,’|’ 表示一支 蜡烛 。 同时给你一个下标从 0 开始的二维整数数组 q…

Template、ItemsPanel、ItemContainerStyle、ItemTemplate

原文:Template、ItemsPanel、ItemContainerStyle、ItemTemplate先来看一张图(网上下的图,加了几个字) 实在是有够“乱”的,慢慢来理一下; 1、Template是指控件的样式 在WPF中所有继承自contentcontrol类的控件都含有此属性,&#…

熊猫烧香分析报告_熊猫分析进行最佳探索性数据分析

熊猫烧香分析报告目录 (Table of Contents) Introduction 介绍 Overview 总览 Variables 变数 Interactions 互动互动 Correlations 相关性 Missing Values 缺失值 Sample 样品 Summary 摘要 介绍 (Introduction) There are countless ways to perform exploratory data analys…

白裤子变粉裤子怎么办_使用裤子构建构建数据科学的monorepo

白裤子变粉裤子怎么办At HousingAnywhere, one of the first major obstacles we had to face when scaling the Data team was building a centralised repository that contains our ever-growing machine learning applications. Between these projects, many of them shar…

支持向量机SVM算法原理及应用(R)

支持向量机SVM算法原理及应用(R) 2016年08月17日 16:37:25 阅读数:22292更多 个人分类: 数据挖掘实战应用版权声明:本文为博主原创文章,转载请注明来源。 https://blog.csdn.net/csqazwsxedc/article/detai…

mad离群值_全部关于离群值

mad离群值An outlier is a data point in a data set that is distant from all other observations. A data point that lies outside the overall distribution of the dataset. Or in a layman term, we can say, an outlier is something that behaves differently from th…

青年报告_了解青年的情绪

青年报告Youth-led media is any effort created, planned, implemented, and reflected upon by young people in the form of media, including websites, newspapers, television shows, and publications. Such platforms connect writers, artists, and photographers in …

post提交参数过多时,取消Tomcat对 post长度限制

1.Tomcat 默认的post参数的最大大小为2M, 当超过时将会出错,可以配置maxPostSize参数来改变大小。 从 apache-tomcat-7.0.63 开始,参数 maxPostSize 的含义就变了: 如果将值设置为 0,表示 POST 最大值为 0,…

map(平均平均精度_客户的平均平均精度

map(平均平均精度Disclaimer: this was created for my clients because it’s rather challenging to explain such a complex metric in simple words, so don’t expect to see much of math or equations here. And remember that I try to keep it simple.免责声明 &#…

Sublime Text 2搭建Go开发环境,代码提示+补全+调试

本文在已安装Go环境的前提下继续。 1、安装Sublime Text 2 2、安装Package Control。 运行Sublime,按下 Ctrl(在Tab键上边),然后输入以下内容: import urllib2,os,hashlib; h 7183a2d3e96f11eeadd761d777e62404 e330…

zookeeper、hbase常见命令

a) Zookeeper:帮助命令-help i. ls /查看zk下根节点目录 ii. create /zk_test my_data//在测试集群没有创建成功 iii. get /zk_test my_data//获取节点信息 iv. set / zk_test my_data//更改节点相关信息 v. delete /zk_test//删除节点信…

鲜活数据数据可视化指南_数据可视化实用指南

鲜活数据数据可视化指南Exploratory data analysis (EDA) is an essential part of the data science or the machine learning pipeline. In order to create a robust and valuable product using the data, you need to explore the data, understand the relations among v…

Linux lsof命令详解

lsof(List Open Files) 用于查看你进程开打的文件,打开文件的进程,进程打开的端口(TCP、UDP),找回/恢复删除的文件。是十分方便的系统监视工具,因为lsof命令需要访问核心内存和各种文件,所以需要…

史密斯卧推:杠铃史密斯下斜卧推、上斜机卧推、平板卧推动作图解

史密斯卧推:杠铃史密斯下斜卧推、上斜机卧推、平板卧推动作图解 史密斯卧推(smith press)是固定器械上完成的卧推,对于初级健身者来说,自由卧推(哑铃卧推、杠铃卧推)还不能很好地把握平衡性&…