vs显示堆栈数据分析_什么是“数据分析堆栈”?

vs显示堆栈数据分析

A poor craftsman blames his tools. But if all you have is a hammer, everything looks like a nail.

一个可怜的工匠责怪他的工具。 但是,如果您只有一把锤子,那么一切看起来都像钉子。

It’s common for web developers or database adminstrators to refer to their “stack” of tools used to do the job, but I’ve never heard this moniker used for data analysts. So it got me thinking, what is the data analytics stack?

Web开发人员或数据库管理员通常会引用他们的“堆栈”工具来完成这项工作,但是我从未听说过这个用于数据分析师的绰号。 因此,我想到了什么是数据分析堆栈?

Data analysts make range of a wide variety of software, for a wide variety of tasks. When a solution comes up short, the focus ought not to be on “blaming” tools for their shortcomings, but on possessing alternatives and choosing a better one (or ones) for the given scenario.

数据分析人员可以使用各种各样的软件来完成各种各样的任务。 当解决方案出现问题时,重点不应放在针对其缺点的“责备”工具上,而在于针对给定方案拥有替代方案并选择更好的方案。

That is, it’s better to think of these tools as “slices” of the same stack to be used concurrently, rather than as misfits to be entirely discarded.

也就是说,最好将这些工具视为要同时使用的同一堆栈的“切片”,而不是被完全丢弃的不匹配项。

To imagine what the analytics stack might look like, I used the below data products Venn diagram, placing the logos of popular data analytics tools in their respective segments.

为了想象分析堆栈的外观,我使用了以下数据产品维恩图 ,将流行的数据分析工具的徽标放在各自的细分中。

Image for post
Data Community DC数据社区DC

After stepping back from my marked-up Venn diagram, four categories or “slices” of the stack appeared to me. Let’s get to them below; but first, a caveat.

从我标记的维恩图退后,我看到了堆栈的四个类别或“切片”。 让我们在下面找到它们; 但首先要注意。

保持供应商不可知 (Staying vendor agnostic)

Some vendors have packaged their own “stack” of tools for data analysis; for example, Microsoft’s Power Platform or Google Data Studio. I am keeping my overview of the stack vendor-agnostic.

一些供应商已经打包了自己的“堆栈”工具来进行数据分析。 例如Microsoft的Power Platform或Google Data Studio。 我保持对堆栈供应商不可知的概述。

While you may learn that some slices fit better together, it’s better to start with the context of what category to tool to use, when, rather than what vendor. I will, however, provide a brief industry landscape of these products below, and suggestions for future learning.

虽然您可能会发现某些部分可以更好地结合在一起,但最好从使用哪种工具,何时使用的类别而不是什么供应商的上下文开始。 但是,我将在下面提供这些产品的简要行业概况,并为以后的学习提供建议。

试算表 (Spreadsheets)

Reports of the death of spreadsheets are greatly exaggerated. For their ease of use and flexibility, spreadsheets are an excellent choice for back-of-the-envelope calculations and prototyping.

电子表格死亡的报告被大大夸大了。 由于其易用性和灵活性,电子表格是进行封底计算和原型制作的绝佳选择。

However, spreadsheets do have their limitations. They can lack data integrity, storage and delivery functionalities. These limitations are often what cause pundits to give spreadsheets their last rites. But this misses the point of “the stack” entirely — those tasks aren’t the proper context for spreadsheets in the first place.

但是,电子表格确实有其局限性。 它们可能缺乏数据完整性,存储和交付功能。 这些局限性通常是导致专家给电子表格提供最新服务的原因。 但这完全错过了“堆栈”的要点-这些任务最初并不是电子表格的适当上下文。

The major spreadsheet applications are Microsoft Excel and Google Sheets. I won’t tell you outright my preference, but you may find out if you follow me on social media for long.

主要的电子表格应用程序是Microsoft Excel和Google表格。 我不会直接告诉您我的偏好,但是您可能会发现您是否在社交媒体上长期关注我。

资料库 (Databases)

Databases are a relatively ancient technology in the analytics space, but show no signs of slowing. They offer more reliable and extensible methods for data storage and integrity, but the actual analysis easily done directly inside databases is limited.

数据库是分析领域中相对较旧的技术,但没有丝毫放缓的迹象。 它们为数据存储和完整性提供了更可靠和可扩展的方法,但是直接在数据库内部轻松进行的实际分析受到限制。

Structured query language, or SQL, is the language used to interact with relational database management systems. While many SQL platforms exist, the types of read-only operations necessary for most data analysts won’t change across them.

结构化查询语言或SQL,是用于与关系数据库管理系统进行交互的语言。 尽管存在许多SQL平台,但大多数数据分析师所需的只读操作类型不会在它们之间发生变化。

For data analysts new to SQL, I suggest SQLite or Microsoft Access as lightweight tools for learning SQL.

对于不熟悉SQL的数据分析师,我建议使用SQLite或Microsoft Access作为学习SQL的轻量级工具。

商业智能和仪表板平台 (Business intelligence & dashboard platforms)

This is a broad swathe of tools and it’s likely the most ambiguous slice of the stack, but here I mean enterprise tools that allow users to gather, model and display data.

这是各种各样的工具,可能是堆栈中最模糊的部分,但是这里我指的是允许用户收集,建模和显示数据的企业工具。

Data warehousing tools like MicroStrategy and SAP BusinessObjects straddle the line here, since they are tools designed for self-service data gathering and analysis. But these often have limited visualization and iteractive report-building included.

诸如MicroStrategy和SAP BusinessObjects之类的数据仓库工具是这里的佼佼者,因为它们是设计用于自助数据收集和分析的工具。 但是,这些方法通常在可视化和有限的报表生成方面受到限制。

That’s where tools like Power BI, Tableau and Looker come in. These tools allow users to build data models, dashboards and reports with minimal coding. Importantly, they make it easy to disseminate and update information across an organization.

这就是诸如Power BI,Tableau和Looker之类的工具出现的地方。这些工具允许用户以最少的代码构建数据模型,仪表板和报告。 重要的是,它们使在整个组织中传播和更新信息变得容易。

However, these tools tend to be inflexible in the way they handle and visualize data. They can also be expensive, with single-user annual licenses running several hundred or even thousands of dollars.

但是,这些工具在处理和可视化数据方面往往缺乏灵活性。 它们也可能很昂贵,单用户年度许可证要花费数百甚至数千美元。

数据编程语言 (Data programming languages)

While many vendor tools are moving to a place where coding is not as essential to the data workflow, I still think it’s a good idea to learn programming. This helps sharpen understanding of how data processing works, and gives users fuller control of their workflow over using a graphical user interface (GUI).

尽管许多供应商工具正在迁移到编码对数据工作流不那么重要的地方,但我仍然认为学习编程是一个好主意。 这有助于加深对数据处理方式的理解,并通过图形用户界面(GUI)使用户对他们的工作流程有更全面的控制。

For data analytics, two open-source programming language are good fits: R and Python. Each include a dizzying universe of free packages made to help with everything from social media automation to geospatial analysis. Learning these tools also opens the door to advanced analytics and data science.

对于数据分析,两种开源编程语言非常适合:R和Python。 每个软件包都包含令人眼花of乱的免费软件包,可帮助您处理从社交媒体自动化到地理空间分析的所有问题。 学习这些工具还为高级分析和数据科学打开了一扇门。

However, this slice could have the steepest learning curve in the stack, and many analysts may struggle to see the benefit of learning to code, when they can do most of what they need easily enough from a GUI.

但是,这部分可能是堆栈中最陡峭的学习曲线,并且当他们可以从GUI轻松地完成大部分所需工作时,许多分析师可能很难看到学习编码的好处。

不分好坏,只是有所不同 (Not better or worse, just different)

Seen in the light of a “stack,” it makes little sense to compare any of these slices, or claim one as inferior than the other. They are meant to be complementary.

从“堆栈”的角度来看,比较这些切片中的任何切片,或声称其中一个切片的质量低于另一个切片,都没有什么意义。 它们是互补的。

Data analysts often wonder which tool they should focus on learning or becoming the expert in. I would suggest not becoming the expert in any single one, but in learning each slice of the stack well enough to contextualize and choose between them.

数据分析人员经常想知道应该专注于学习或成为专家的工具。我建议不要成为任何一个专家,而是要充分学习堆栈的每个部分以进行上下文关联并在它们之间进行选择。

进入堆栈 (Entering the stack)

Learning one data tool is daunting. Learning a whole “stack” of them can seem impossible. However, this cross-training can expedite growth, as connections are made across platforms in how to use data effectively.

学习一种数据工具令人生畏。 学习整个“堆栈”似乎是不可能的。 但是,由于跨平台建立了如何有效使用数据的联系,因此这种交叉训练可以加快增长。

What data tools do you use? How do you fit together? Other thoughts on the idea of an “analytics stack?” Let’s discuss in the comments.

您使用什么数据工具? 你们如何在一起? 关于“分析堆栈”的其他想法? 让我们在评论中进行讨论。

Originally published at https://georgejmount.com on August 8, 2020.

最初于 2020年8月8日 发布在 https://georgejmount.com 上。

翻译自: https://medium.com/@georgemount/what-is-the-data-analytics-stack-7c87e4d4c2e

vs显示堆栈数据分析

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/388559.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

树莓派 zero linux,树莓派 zero基本调试

回家之前就从网上购买了一堆设备,回去也不能闲着,可以利用家里相对齐全的准备安装调试。结果人还没回来,东西先到了。购买的核心装备是树莓派zero w,虽然已经知道它比家族大哥树莓派小不少,但拿到手里还是惊奇它的小巧…

简单的编译流程

简易编译器流程图: 一个典型的编译器,可以包含为一个前端,一个后端。前端接收源程序产生一个中间表示,后端接收中间表示继续生成一个目标程序。所以,前端处理的是跟源语言有关的属性,后端处理跟目标机器有关的属性。 复…

广告投手_测量投手隐藏自己的音高的程度

广告投手As the baseball community has recently seen with the Astros 2017 cheating scandal, knowing what pitch is being thrown gives batters a game-breaking advantage. However, unless you have an intricate system of cameras and trash cans set up, knowing wh…

验证部分表单是否重复

1. 效果 图片中的名称、机构编码需要进行重复验证2. 思路及实现 表单验证在获取数据将需要验证的表单数据进行保存this.nameChangeTemp response.data.orgName;this.codeChangeTemp response.data.orgCode; 通过rule对表单进行验证 以名字的验证为例rules: {orgName: [// 设置…

python bokeh_提升视觉效果:使用Python和Bokeh制作交互式地图

python bokehLet’s face it, fellow data scientists: our clients LOVE dashboards. Why wouldn’t they? Visualizing our data helps us tell a story. Visualization turns thousands of rows of data into a compelling and beautiful narrative. In fact, dashboard vi…

用C#写 四舍五入函数(原理版)

doubled 0.06576523;inti (int)(d/0.01);//0.01决定了精度 doubledd (double)i/100;//还原 if(d-dd>0.005)dd0.01;//四舍五入 MessageBox.Show((dd*100).ToString()"%");//7%,dd*100就变成百分的前面那一部分了

浪里个浪 FZU - 2261

TonyY是一个喜欢到处浪的男人,他的梦想是带着兰兰姐姐浪遍天朝的各个角落,不过在此之前,他需要做好规划。 现在他的手上有一份天朝地图,上面有n个城市,m条交通路径,每条交通路径都是单行道。他已经预先规划…

C#设计模式(9)——装饰者模式(Decorator Pattern)

一、引言 在软件开发中,我们经常想要对一类对象添加不同的功能,例如要给手机添加贴膜,手机挂件,手机外壳等,如果此时利用继承来实现的话,就需要定义无数的类,如StickerPhone(贴膜是手…

nosql_探索NoSQL系列

nosql数据科学 (Data Science) Knowledge on NoSQL databases seems to be an increasing requirement in data science applications, yet, the taxonomy is so diverse and problem-centered that it can be a challenge to grasp them. This post attempts to shed light on…

C++TCP和UDP属于传输层协议

TCP和UDP属于传输层协议。其中TCP提供IP环境下的数据可靠传输,它事先为要发送的数据开辟好连接通道(三次握手),然后再进行数据发送;而UDP则不为IP提供可靠性,一般用于实时的视频流传输,像rtp、r…

程序员如何利用空闲时间挣零花钱

一: 私活 作为一名程序员,在上班之余,我们有大把的时间,不能浪费,这些时间其实都是可以用来挖掘自己潜在的创造力,今天要讨论的话题就是,程序员如何利用空余时间挣零花钱?比如说周末…

python中api_通过Python中的API查找相关的工作技能

python中api工作技能世界 (The World of Job Skills) So you want to figure out where your skills fit into today’s job market. Maybe you’re just curious to see a comprehensive constellation of job skills, clean and standardized. Or you need a taxonomy of ski…

欺诈行为识别_使用R(编程)识别欺诈性的招聘广告

欺诈行为识别背景 (Background) Online recruitment fraud (ORF) is a form of malicious behaviour that aims to inflict loss of privacy, economic damage or harm the reputation of the stakeholders via fraudulent job advertisements.在线招聘欺诈(ORF)是一种恶意行为…

c语言实验四报告,湖北理工学院14本科C语言实验报告实验四数组

湖北理工学院14本科C语言实验报告实验四 数组.doc实验四 数 组实验课程名C语言程序设计专业班级 14电气工程2班 学号 201440210237 姓名 熊帆 实验时间 5.12-5.26 实验地点 K4-208 指导教师 祁文青 一、实验目的和要求1. 掌握一维数组和二维数组的定义、赋值和输入输出的方法&a…

rabbitmq channel参数详解【转】

1、Channel 1.1 channel.exchangeDeclare(): type:有direct、fanout、topic三种durable:true、false true:服务器重启会保留下来Exchange。警告:仅设置此选项,不代表消息持久化。即不保证重启后消息还在。原…

nlp gpt论文_GPT-3:NLP镇的最新动态

nlp gpt论文什么是GPT-3? (What is GPT-3?) The launch of Open AI’s 3rd generation of the pre-trained language model, GPT-3 (Generative Pre-training Transformer) has got the data science fraternity buzzing with excitement!Open AI的第三代预训练语言…

真实不装| 阿里巴巴新人上路指北

新手上路,总想听听前辈们分享他们走过的路。橙子选取了阿里巴巴合伙人逍遥子(阿里巴巴集团CEO) 、Eric(蚂蚁金服董事长兼CEO)、Judy(阿里巴巴集团CPO)的几段分享,他们是如何看待职场…

小程序学习总结

上个周末抽空了解了一下小程序,现在将所学所感记录以便日后翻看;需要指出的是我就粗略过了下小程序的api了解了下小程序的开发流程以及工具的使用,然后写了一个小程序的demo;在我看来,如果有前端基础学习小程序无异于锦上添花了,而我这个三年的码农虽也写过不少前端代码但离专业…

uber 数据可视化_使用R探索您在Uber上的活动:如何分析和可视化您的个人数据历史记录

uber 数据可视化Perhaps, dear reader, you are too young to remember that before, the only way to request a particular transport service such as a taxi was to raise a hand to make a signal to an available driver, who upon seeing you would stop if he was not …

java B2B2C springmvc mybatis电子商城系统(四)Ribbon

2019独角兽企业重金招聘Python工程师标准>>> 一:Ribbon是什么? Ribbon是Netflix发布的开源项目,主要功能是提供客户端的软件负载均衡算法,将Netflix的中间层服务连接在一起。Ribbon客户端组件提供一系列完善的配置项如…