数据科学家编程能力需要多好_我们不需要这么多的数据科学家

数据科学家编程能力需要多好

I have held the title of data scientist in two industries. I’ve interviewed for more than 30 additional data science positions. I’ve been the CTO of a data-centric startup. I’ve done many hours of data science consulting.

我曾担任过两个行业的数据科学家。我已经面试了30多个其他数据科学职位。我曾担任以数据为中心的初创公司的CTO。我已经完成了许多小时的数据科学咨询。

With that background, you will hopefully realize that I’m not a data denier. I’m a firm believer in the power of statistics, machine learning, and all the tools in a data scientist’s toolbox. I know that data science is a powerhouse field filled with amazing people that are changing the world.

有这样的背景，您将有希望认识到我不是拒绝数据的人。我坚信统计，机器学习以及数据科学家工具箱中的所有工具的强大功能。我知道数据科学是一个强大的领域，充满着改变世界的杰出人士。

That being said, many companies don’t need a data scientist.

话虽这么说，许多公司并不需要数据科学家。

No, that wasn’t strong enough. Let me try again.

不，那还不够强大。让我再试一遍。

The vast majority of companies that are looking for a data scientist don’t need one.
寻找数据科学家的绝大多数公司都不需要。

Of all the companies I’ve worked or interviewed with as a data scientist, I’d say 80% of them were looking for the wrong role.

在我作为数据科学家工作或采访过的所有公司中，我要说其中80％都在寻找错误的角色。

Some of them just needed a data analyst. Others needed a data engineer or a data architect. The rest didn’t have a data need at all.

其中一些只需要一个数据分析师。其他人则需要数据工程师或数据架构师。其余的完全没有数据需求。

您想解决什么问题？ (What problem are you looking to solve?)

I always ask this question when someone is looking to hire me. Originally, I asked what they were looking to do with their data, but I’ve since realized that the answer to that latter question doesn’t matter. The focus needs to be on the problem, not the solution. Companies hire to solve problems.

当有人要雇用我时，我总是问这个问题。最初，我问他们想如何处理他们的数据，但后来我意识到对后一个问题的答案并不重要。重点需要放在问题上，而不是解决方案上。公司雇用来解决问题。

Good companies don’t hire a position because it’s trendy to have around. They hire because — for every dollar that employee costs them — they are getting more than a dollar in return. It’s that simple. It’s all about ROI.

好的公司不会雇用职位，因为这很时髦。他们之所以雇用，是因为-员工每花费1美元，他们就会获得超过1美元的回报。就这么简单。都是关于投资回报率的。

All companies understand that when it comes to positions like accounting and sales because they know how ROI works for accounting or sales. They know what problem needs to be solved and they know who can do it.

所有公司都了解会计和销售等职位，因为他们知道投资回报率如何用于会计或销售。他们知道需要解决什么问题，并且知道谁可以解决。

But data confuses companies. It especially confuses older companies, but startups are not immune. We’ve all been told that there’s gold in them thar data.

但是数据使公司感到困惑。它尤其使较老的公司感到困惑，但是初创公司并非无法幸免。我们都被告知这些数据中有黄金。

And who doesn’t love a good gold rush?

还有谁不喜欢淘金热呢？

Just like the gold rush of old, most people don’t know where to look for the gold, many of them have fallen for fool’s gold, and no matter how much a vein has been picked clean, people keep coming back looking for scraps.
就像古老的淘金热一样，大多数人都不知道在哪里寻找黄金，其中许多人已经沦为傻瓜的黄金，而且无论清理了多少静脉，人们都不断回来寻找废料。

The underlying issue is that companies have been told their data is valuable. And it might be. But whether packaged for sale or used internally, data is a part of a solution, and every solution’s value is determined by the cost of the problem it is solving.

根本问题是，公司被告知其数据很有价值。可能是这样。但是，无论是打包出售还是内部使用，数据都是解决方案的一部分，每个解决方案的价值都取决于解决方案的成本。

Without a problem, a solution is just an idea. And, as I’ve mentioned in multiple previous posts, ideas are worthless.

没有问题，解决方案只是一个想法。而且，正如我在之前的多篇文章中提到的那样，想法毫无价值。

Data rushes happen because companies have a solution — data — and they are looking for a problem to apply it to. It’s a completely backward approach. You don’t decide to use screws because you have a screwdriver handy. You decide to use a screwdriver because you need to tighten a screw.

出现数据高峰是因为公司拥有解决方案-数据-并且他们正在寻找将其应用的问题。这是一种完全落后的方法。由于螺丝刀很方便，因此您不决定使用螺钉。您决定使用螺丝刀，因为您需要拧紧螺丝。

Data is a resource. So why is data not treated like any other resource?

数据是一种资源。那么为什么数据没有像其他资源一样被对待呢？

Data is inherently different than other resources in one important way.

数据在一种重要方式上与其他资源固有地不同。

Let’s look at oil, a pretty standard resource. Unless you are The Beverly Hillbillies, you don’t just find oil lying around in your backyard. If you have thousands of tons of oil, you have it because you planned to have it for a specific purpose. And once you use it for that purpose, it’s gone.

让我们看一下石油，这是一种非常标准的资源。除非您是The Beverly Hillbillies ，否则您不仅会发现后院周围散布着石油。如果您有数千吨的石油，那么就拥有它是因为您计划将其用于特定目的。一旦将其用于此目的，它就消失了。

But companies have exabytes of data. Maybe they had it for a purpose. Maybe there was a regulatory requirement for them to keep it. Maybe it was just easier to keep than to throw away.

但是公司拥有EB级的数据。也许他们有目的。也许他们有保留的监管要求。也许保留起来比扔掉要容易。

Whatever the reason, they have it now, and they want to use it. They just don’t know what to use it for. And they often assume data scientists are the answer. After all, data is right there in the title, and scientists are smart.

无论出于何种原因，他们现在都拥有它，并且想要使用它。他们只是不知道用它做什么。他们通常认为数据科学家就是答案。毕竟，数据就在标题中，科学家是聪明的。

科学家不是你拼写工程师的方式 (S-c-i-e-n-t-i-s-t is not how you spell engineer)

Image for post — Photo by NeONBRAND on Unsplash

Let me give these companies the benefit of the doubt and say they actually do have problems that their data could solve. That still doesn’t necessarily make hiring a data scientist the correct next step.

让我给这些公司带来疑问的好处，并说他们确实确实存在其数据可以解决的问题。但这并不一定使下一步聘请数据科学家成为正确的选择。

Data scientists solve puzzles. They take billions of pieces of data and turn them into a single, cohesive picture. But they can’t do that if you don’t give them all the pieces.
数据科学家解决难题。他们获取数十亿条数据，并将它们转变为单一的，有凝聚力的图像。但是，如果您不给他们所有的东西，他们将无法做到这一点。

If your data streams into ten different systems that don’t talk to each other, you are setting your data scientist up for failure. You need someone that can bridge those systems, bringing the data into a single place. That’s the job of a data engineer, not a data scientist. Depending on the situation, you may also need data architecture, data modeling, and database administration.

如果您的数据流到十个彼此不通信的不同系统中，那么您将使数据科学家面临失败的准备。您需要可以桥接这些系统的人员，将数据放在一个地方。那是数据工程师的工作，而不是数据科学家的工作。根据情况，您可能还需要数据体系结构，数据建模和数据库管理。

If you really want to, you can find a data scientist that can handle everything from the engineering to the DB admin work. I’ve been that data scientist. But my rate was much higher than what they would have paid to just hire the correct person for the job.

如果确实需要，您可以找到一个数据科学家，可以处理从工程到数据库管理员的所有工作。我一直是那个数据科学家。但是我的薪水比他们仅仅雇用合适的人所付出的薪水要高得多。

Why did they overpay? Because they didn’t yet understand the current status of their data or what a data scientist actually does.

他们为什么多付钱？因为他们还不了解数据的当前状态或数据科学家的实际行为。

Why did I take the job? Because I was too naive to know better.

我为什么要这份工作？因为我太天真，无法更好地了解。

Everyone would have been better off if the company had hired a data engineer, waited 6–12 months, then brought on a data scientist when they were fully prepared.

如果公司聘请了一位数据工程师，等待了6到12个月，然后在他们做好充分准备的情况下请来了一位数据科学家，那么每个人都会过得更好。

准备？有目标吗？聘请！ (Ready? Have an aim? Hire!)

Has your company identified problems that you need data science to solve?

您的公司是否已确定需要数据科学解决的问题？

Is your data in a state that a data scientist can work with?

您的数据处于数据科学家可以使用的状态吗？

If you answered both of these with a definitive ‘yes’, then you may need a data scientist. Congratulations, your company is doing things right. Pat yourselves on the back no more than three times then go do some amazing things.

如果您用肯定的“是”回答了这两个问题，那么您可能需要一位数据科学家。恭喜，您的公司做对了。拍拍自己的背部不超过三遍，然后去做一些令人惊奇的事情。

If you answered either question with a ‘no’ or a general look of confusion, then save your money and a data scientist’s sanity by taking down that job posting you just put up. Maybe replace it with a posting for a data engineer or data analyst. Or maybe just be happy not to have to go through the hiring process.

如果您回答“否”或普遍感到困惑，则可以通过删除刚提出的工作来节省金钱和数据科学家的理智。也许将其替换为数据工程师或数据分析师的帖子。或者也许只是高兴地不必经历整个招聘过程。

Not sure what you need? Talk to a data consultant before you waste your money.
不确定你需要什么？在浪费金钱之前，请与数据顾问联系。

Like this advice? Take 0.001% of the money you just saved and buy me a drink someday.

喜欢这个建议吗？拿走您刚存的钱的0.001％，有一天再给我喝一杯。

翻译自: https://medium.com/swlh/do-we-need-data-scientists-8d8e8062688a

数据科学家编程能力需要多好

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.mzph.cn/news/389058.shtml

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈email:809451989@qq.com，一经查实，立即删除！

基于xtrabackup GDIT方式不锁库作主从同步（主主同步同理，反向及可）

1.安装数据同步工具注：xtrabackup 在数据恢复的时候比mysqldump要快很多，特别是大数据库的时候，但网络传输的内容要多，压缩需要占用时间。 yum install https://www.percona.com/downloads/XtraBackup/Percona-XtraBackup-2.4.12…

excel表格行列显示十字定位_WPS表格：Excel表格打印时，如何每页都显示标题行？...

电子表格数据很多的时候，要分很多页打印，如何每页都能显示标题行呢？以下表为例，我们在WPS2019中演示如何每页都显示前两行标题行？1.首先点亮顶部的页面布局选项卡。然后点击打印标题或表头按钮。2.在弹出的页面设置对话…

opencv(二) 图片处理

opencv 图片处理 opencv 图片像素操作取像素点操作设置像素点取图片块分离，合并 b, g, r import numpy as np import cv2 as cvimg cv.imread(/Users/guoyinhuang/Desktop/G77.jpeg)# 获取像素值 px img[348, 120] # 0 是y, 1 是x print(px)blue img[100, 1…

【NLP】语言模型和迁移学习

10.13 Update：最近新出了一个state-of-the-art预训练模型，传送门：李入魔：【NLP】Google BERT详解zhuanlan.zhihu.com1. 简介长期以来，词向量一直是NLP任务中的主要表征技术。随着2017年底以及2018年初的一系列技术突…

TCPIP传送协议

以下代码实现在客户端查询成绩（数据库在服务器端）: 客户端： static void Main(string[] args) { string str null; while (str ! Convert.ToString(0)) { Console.WriteLine("…

sql优化技巧_使用这些查询优化技巧成为SQL向导

sql优化技巧成为SQL向导！ (Become an SQL Wizard!) It turns out storing data by rows and columns is convenient in a lot of situations, so relational databases have remained a cornerstone of data management in businesses across the globe. Structured…

Day 4：集合——迭代器与List接口

Collection-迭代方法 1、toArray() 返回Object类型数据，接收也需要Object对象！ Object[] toArray(); Collection c new ArrayList(); Object[] arr c.toArray(); 2、iterator() Collection的方法，返回实现Iterator接口的对象，…

oem是代工还是贴牌_代加工和贴牌加工的区别是什么

展开全部代加工就是替别人加工，贴别人的牌子。贴牌加工即商家自己不生产，而是委托其他生产企e68a8462616964757a686964616f31333365663431业生产，而品牌是自己的。拓展资料：OEM(Original Equipment Manufacture)的基本含义是定牌生…

KNN 算法--图像分类算法

KNN 算法–图像分类算法找到最近的K个邻居，在前k个最近样本中选择最近的占比最高的类别作为预测类别。给定测试对象，计算它与训练集中每个对象的距离。圈定距离最近的k个训练对象，作为测试对象的邻居。根据这k个紧邻对象所属的类别&#xf…

java核心技术-NIO

1、reactor（反应器）模式使用单线程模拟多线程，提高资源利用率和程序的效率，增加系统吞吐量。下面例子比较形象的说明了什么是反应器模式： 一个老板经营一个饭店， 传统模式 - 来一个客人安排一个服务员招呼…

物种分布模型_减少物种分布建模中的空间自相关

物种分布模型Species distribution models (SDM; for review and definition see, e.g., Peterson et al., 2011) are a dominant paradigm to quantify the relationship between environmental dynamics and several manifestations of species biogeography. These statisti…