数据科学家编程能力需要多好_我们不需要这么多的数据科学家

数据科学家编程能力需要多好

I have held the title of data scientist in two industries. I’ve interviewed for more than 30 additional data science positions. I’ve been the CTO of a data-centric startup. I’ve done many hours of data science consulting.

我曾担任过两个行业的数据科学家。 我已经面试了30多个其他数据科学职位。 我曾担任以数据为中心的初创公司的CTO。 我已经完成了许多小时的数据科学咨询。

With that background, you will hopefully realize that I’m not a data denier. I’m a firm believer in the power of statistics, machine learning, and all the tools in a data scientist’s toolbox. I know that data science is a powerhouse field filled with amazing people that are changing the world.

有这样的背景,您将有希望认识到我不是拒绝数据的人。 我坚信统计,机器学习以及数据科学家工具箱中的所有工具的强大功能。 我知道数据科学是一个强大的领域,充满着改变世界的杰出人士。

That being said, many companies don’t need a data scientist.

话虽这么说,许多公司并不需要数据科学家。

No, that wasn’t strong enough. Let me try again.

不,那还不够强大。 让我再试一遍。

The vast majority of companies that are looking for a data scientist don’t need one.

寻找数据科学家的绝大多数公司都不需要。

Of all the companies I’ve worked or interviewed with as a data scientist, I’d say 80% of them were looking for the wrong role.

在我作为数据科学家工作或采访过的所有公司中,我要说其中80%都在寻找错误的角色。

Some of them just needed a data analyst. Others needed a data engineer or a data architect. The rest didn’t have a data need at all.

其中一些只需要一个数据分析师。 其他人则需要数据工程师或数据架构师。 其余的完全没有数据需求。

您想解决什么问题? (What problem are you looking to solve?)

I always ask this question when someone is looking to hire me. Originally, I asked what they were looking to do with their data, but I’ve since realized that the answer to that latter question doesn’t matter. The focus needs to be on the problem, not the solution. Companies hire to solve problems.

当有人要雇用我时,我总是问这个问题。 最初,我问他们想如何处理他们的数据,但后来我意识到对后一个问题的答案并不重要。 重点需要放在问题上,而不是解决方案上。 公司雇用来解决问题。

Good companies don’t hire a position because it’s trendy to have around. They hire because — for every dollar that employee costs them — they are getting more than a dollar in return. It’s that simple. It’s all about ROI.

好的公司不会雇用职位,因为这很时髦。 他们之所以雇用,是因为-员工每花费1美元,他们就会获得超过1美元的回报。 就这么简单。 都是关于投资回报率的。

All companies understand that when it comes to positions like accounting and sales because they know how ROI works for accounting or sales. They know what problem needs to be solved and they know who can do it.

所有公司都了解会计和销售等职位,因为他们知道投资回报率如何用于会计或销售。 他们知道需要解决什么问题,并且知道谁可以解决。

But data confuses companies. It especially confuses older companies, but startups are not immune. We’ve all been told that there’s gold in them thar data.

但是数据使公司感到困惑。 它尤其使较老的公司感到困惑,但是初创公司并非无法幸免。 我们都被告知这些数据中有黄金。

And who doesn’t love a good gold rush?

还有谁不喜欢淘金热呢?

Just like the gold rush of old, most people don’t know where to look for the gold, many of them have fallen for fool’s gold, and no matter how much a vein has been picked clean, people keep coming back looking for scraps.

就像古老的淘金热一样,大多数人都不知道在哪里寻找黄金,其中许多人已经沦为傻瓜的黄金,而且无论清理了多少静脉,人们都不断回来寻找废料。

The underlying issue is that companies have been told their data is valuable. And it might be. But whether packaged for sale or used internally, data is a part of a solution, and every solution’s value is determined by the cost of the problem it is solving.

根本问题是,公司被告知其数据很有价值。 可能是这样。 但是,无论是打包出售还是内部使用,数据都是解决方案的一部分,每个解决方案的价值都取决于解决方案的成本。

Without a problem, a solution is just an idea. And, as I’ve mentioned in multiple previous posts, ideas are worthless.

没有问题,解决方案只是一个想法。 而且,正如我在之前的多篇文章中提到的那样,想法毫无价值。

Data rushes happen because companies have a solution — data — and they are looking for a problem to apply it to. It’s a completely backward approach. You don’t decide to use screws because you have a screwdriver handy. You decide to use a screwdriver because you need to tighten a screw.

出现数据高峰是因为公司拥有解决方案-数据-并且他们正在寻找将其应用的问题。 这是一种完全落后的方法。 由于螺丝刀很方便,因此您不决定使用螺钉。 您决定使用螺丝刀,因为您需要拧紧螺丝。

Data is a resource. So why is data not treated like any other resource?

数据是一种资源。 那么为什么数据没有像其他资源一样被对待呢?

Data is inherently different than other resources in one important way.

数据在一种重要方式上与其他资源固有地不同。

Let’s look at oil, a pretty standard resource. Unless you are The Beverly Hillbillies, you don’t just find oil lying around in your backyard. If you have thousands of tons of oil, you have it because you planned to have it for a specific purpose. And once you use it for that purpose, it’s gone.

让我们看一下石油,这是一种非常标准的资源。 除非您是The Beverly Hillbillies ,否则您不仅会发现后院周围散布着石油。 如果您有数千吨的石油,那么就拥有它是因为您计划将其用于特定目的。 一旦将其用于此目的,它就消失了。

But companies have exabytes of data. Maybe they had it for a purpose. Maybe there was a regulatory requirement for them to keep it. Maybe it was just easier to keep than to throw away.

但是公司拥有EB级的数据。 也许他们有目的。 也许他们有保留的监管要求。 也许保留起来比扔掉要容易。

Whatever the reason, they have it now, and they want to use it. They just don’t know what to use it for. And they often assume data scientists are the answer. After all, data is right there in the title, and scientists are smart.

无论出于何种原因,他们现在都拥有它,并且想要使用它。 他们只是不知道用它做什么。 他们通常认为数据科学家就是答案。 毕竟,数据就在标题中,科学家是聪明的。

科学家不是你拼写工程师的方式 (S-c-i-e-n-t-i-s-t is not how you spell engineer)

Image for post
Photo by NeONBRAND on Unsplash
NeONBRAND在Unsplash上拍摄的照片

Let me give these companies the benefit of the doubt and say they actually do have problems that their data could solve. That still doesn’t necessarily make hiring a data scientist the correct next step.

让我给这些公司带来疑问的好处,并说他们确实确实存在其数据可以解决的问题。 但这并不一定使下一步聘请数据科学家成为正确的选择。

Data scientists solve puzzles. They take billions of pieces of data and turn them into a single, cohesive picture. But they can’t do that if you don’t give them all the pieces.

数据科学家解决难题。 他们获取数十亿条数据,并将它们转变为单一的,有凝聚力的图像。 但是,如果您不给他们所有的东西,他们将无法做到这一点。

If your data streams into ten different systems that don’t talk to each other, you are setting your data scientist up for failure. You need someone that can bridge those systems, bringing the data into a single place. That’s the job of a data engineer, not a data scientist. Depending on the situation, you may also need data architecture, data modeling, and database administration.

如果您的数据流到十个彼此不通信的不同系统中,那么您将使数据科学家面临失败的准备。 您需要可以桥接这些系统的人员,将数据放在一个地方。 那是数据工程师的工作,而不是数据科学家的工作。 根据情况,您可能还需要数据体系结构,数据建模和数据库管理。

If you really want to, you can find a data scientist that can handle everything from the engineering to the DB admin work. I’ve been that data scientist. But my rate was much higher than what they would have paid to just hire the correct person for the job.

如果确实需要,您可以找到一个数据科学家,可以处理从工程到数据库管理员的所有工作。 我一直是那个数据科学家。 但是我的薪水比他们仅仅雇用合适的人所付出的薪水要高得多。

Why did they overpay? Because they didn’t yet understand the current status of their data or what a data scientist actually does.

他们为什么多付钱? 因为他们还不了解数据的当前状态或数据科学家的实际行为。

Why did I take the job? Because I was too naive to know better.

我为什么要这份工作? 因为我太天真,无法更好地了解。

Everyone would have been better off if the company had hired a data engineer, waited 6–12 months, then brought on a data scientist when they were fully prepared.

如果公司聘请了一位数据工程师,等待了6到12个月,然后在他们做好充分准备的情况下请来了一位数据科学家,那么每个人都会过得更好。

准备? 有目标吗? 聘请! (Ready? Have an aim? Hire!)

Has your company identified problems that you need data science to solve?

您的公司是否已确定需要数据科学解决的问题?

Is your data in a state that a data scientist can work with?

您的数据处于数据科学家可以使用的状态吗?

If you answered both of these with a definitive ‘yes’, then you may need a data scientist. Congratulations, your company is doing things right. Pat yourselves on the back no more than three times then go do some amazing things.

如果您用肯定的“是”回答了这两个问题,那么您可能需要一位数据科学家。 恭喜,您的公司做对了。 拍拍自己的背部不超过三遍,然后去做一些令人惊奇的事情。

If you answered either question with a ‘no’ or a general look of confusion, then save your money and a data scientist’s sanity by taking down that job posting you just put up. Maybe replace it with a posting for a data engineer or data analyst. Or maybe just be happy not to have to go through the hiring process.

如果您回答“否”或普遍感到困惑,则可以通过删除刚提出的工作来节省金钱和数据科学家的理智。 也许将其替换为数据工程师或数据分析师的帖子。 或者也许只是高兴地不必经历整个招聘过程。

Not sure what you need? Talk to a data consultant before you waste your money.

不确定你需要什么? 在浪费金钱之前,请与数据顾问联系。

Like this advice? Take 0.001% of the money you just saved and buy me a drink someday.

喜欢这个建议吗? 拿走您刚存的钱的0.001%,有一天再给我喝一杯。

翻译自: https://medium.com/swlh/do-we-need-data-scientists-8d8e8062688a

数据科学家编程能力需要多好

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389058.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

excel表格行列显示十字定位_WPS表格:Excel表格打印时,如何每页都显示标题行?...

电子表格数据很多的时候,要分很多页打印,如何每页都能显示标题行呢?以下表为例,我们在WPS2019中演示如何每页都显示前两行标题行?1.首先点亮顶部的页面布局选项卡。然后点击打印标题或表头按钮。2.在弹出的页面设置对话…

sql优化技巧_使用这些查询优化技巧成为SQL向导

sql优化技巧成为SQL向导! (Become an SQL Wizard!) It turns out storing data by rows and columns is convenient in a lot of situations, so relational databases have remained a cornerstone of data management in businesses across the globe. Structured…

Day 4:集合——迭代器与List接口

Collection-迭代方法 1、toArray() 返回Object类型数据,接收也需要Object对象! Object[] toArray(); Collection c new ArrayList(); Object[] arr c.toArray(); 2、iterator() Collection的方法,返回实现Iterator接口的对象,…

物种分布模型_减少物种分布建模中的空间自相关

物种分布模型Species distribution models (SDM; for review and definition see, e.g., Peterson et al., 2011) are a dominant paradigm to quantify the relationship between environmental dynamics and several manifestations of species biogeography. These statisti…

深入理解激活函数

为什么需要非线性激活函数? 说起神经网络肯定会降到神经函数,看了很多资料,也许你对激活函数这个名词会感觉很困惑, 它为什么叫激活函数?它有什么作用呢? 看了很多书籍上的讲解说会让神经网络变成很丰富的…

如何一键部署项目、代码自动更新

为什么80%的码农都做不了架构师?>>> 摘要:my-deploy:由nodejs写的一个自动更新工具,理论支持所有语言(php、java、c#)的项目,支持所有git仓库(bitbucket、github等)。github效果如何?如果你的后端项目放在github、bitbucket等git仓库中管理…

Kettle7.1在window启动报错

实验环境: window10 x64 kettle7.1 pdi-ce-7.1.0.0-12.zip 错误现象: a java exception has occurred 问题解决: 运行调试工具 data-integration\SpoonDebug.bat //调试错误的,根据错误明确知道为何启动不了,Y--Y-…

opa847方波放大电路_电子管放大电路当中阴极电阻的作用和选择

胆机制作知识视频:6P14单端胆机用示波器方波测试输出波形详细步骤演示完整版自制胆机试听视频:胆机播放《猛士的士高》经典舞曲 熟悉的旋律震撼的效果首先看下面这一张300B电子管电路图:300B单端胆机原理图图纸里面画圆圈的电阻就是放大电路当…

清洁数据ploy n_清洁屋数据

清洁数据ploy nAs a bootcamp project, I was asked to analyze data about the sale prices of houses in King County, Washington, in 2014 and 2015. The dataset is well known to students of data science because it lends itself to linear regression modeling. You …

redis安装redis集群

NoSql数据库之Redis1、什么是nosql,nosql的应用场景2、Nonsql数据库的类型a) Key-valueb) 文档型(类似于json)c) 列式存储d) 图式3、redis的相关概念kv型的。4、Redis的安装及部署5、Redis的使用方法及数据类型a) Redis启动及关闭b) Redis的数…

机器学习实践一 logistic regression regularize

Logistic regression 数据内容: 两个参数 x1 x2 y值 0 或 1 Potting def read_file(file):data pd.read_csv(file, names[exam1, exam2, admitted])data np.array(data)return datadef plot_data(X, y):plt.figure(figsize(6, 4), dpi150)X1 X[y 1, :]X2 X[…

深度学习数据扩张_适用于少量数据的深度学习结构

作者:Gorkem Polat编译:ronghuaiyang导读一些最常用的few shot learning的方案介绍及对比。传统的CNNs (AlexNet, VGG, GoogLeNet, ResNet, DenseNet…)在数据集中每个类样本数量较多的情况下表现良好。不幸的是,当你拥有一个小数据集时&…

基于边缘计算的实时绩效_基于绩效的营销中的三大错误

基于边缘计算的实时绩效We’ve gone through 20% of the 21st century. It’s safe to say digitalization isn’t a new concept anymore. Things are fully or at least mostly online, and they tend to escalate in the digital direction. That’s why it’s important to…

为什么Facebook的API以一个循环作为开头?

作者 | Antony Garand译者 | 无明如果你有在浏览器中查看过发给大公司 API 的请求,你可能会注意到,JSON 前面会有一些奇怪的 JavaScript:为什么他们会用这几个字节来让 JSON 失效?为了保护你的数据 如果没有这些字节,那…

城市轨道交通运营票务管理论文_城市轨道交通运营管理专业就业前景怎么样?中职优选告诉你...

​​城市轨道交通运营管理专业,专业就业前景怎么样?就业方向有哪些?有很多同学都感觉很迷忙,为了让更多的同学们了解城市轨道交通运营管理专业的就业前景与就业方向,整理出以下内容希望可以帮助同学们。城市轨道交通运…

计算机视觉对扫描文件分类 OCR

通过计算机视觉对扫描文件分类 一种解决扫描文档分类问题的深度学习方法 在数字经济时代, 银行、保险、治理、医疗、法律等部门仍在处理各种手写票据和扫描文件。在业务生命周期的后期, 手动维护和分类这些文档变得非常繁琐。 对这些非机密文档进行简…

笑话生成器_爸爸笑话发生器

笑话生成器(If you’re just here for the generated jokes, scroll down to the bottom!)(如果您只是在这里生成笑话,请向下滚动到底部!) I thought: what is super easy to build, yet would still get an approving chuckle if someone found it on …

机器学习实践二 -多分类和神经网络

本次练习的任务是使用逻辑归回和神经网络进行识别手写数字(form 0 to 9, 自动手写数字问题已经应用非常广泛,比如邮编识别。 使用逻辑回归进行多分类分类 练习2 中的logistic 回归实现了二分类分类问题,现在将进行多分类,one vs…

Hadoop 倒排索引

倒排索引是文档检索系统中最常用的数据结构,被广泛地应用于全文搜索引擎。它主要是用来存储某个单词(或词组)在一个文档或一组文档中存储位置的映射,即提供了一种根据内容来查找文档的方式。由于不是根据文档来确定文档所包含的内…

koa2异常处理_读 koa2 源码后的一些思考与实践

koa2的特点优势什么是 koa2Nodejs官方api支持的都是callback形式的异步编程模型。问题:callback嵌套问题koa2 是由 Express原班人马打造的,是现在比较流行的基于Node.js平台的web开发框架,Koa 把 Express 中内置的 router、view 等功能都移除…