黑客独角兽_双独角兽

黑客独角兽

Preface

前言

Last week my friend and colleague Srivastan Srivsan’s note on LinkedIn about Mathematics and Data Science opened an excellent discussion. Well, it is not something new; there were debates in the tech domain such as vim v.s emacs to many others. The debate about Math and Data science has elevated to new areas every year since 2013. Above all, the industry notion (or confusion) about Unicorn Data Scientist remains as a catalyst to the debate. And the HR is in search of ‘Purple Squirrel.’ Why are we debating? That is an interesting question to ask ourselves.

上周,我的朋友和同事Srivastan Srivsan在LinkedIn上有关数学和数据科学的笔记开始了精彩的讨论。 好吧,这不是新事物。 在技​​术领域有很多其他的辩论,例如vim vs emacs。 自2013年以来,关于数学和数据科学的争论每年都有新的发展。最重要的是,有关Unicorn Data Scientist的行业概念(或困惑)仍然是辩论的催化剂。 人力资源部正在寻找“紫色松鼠”。 我们为什么要辩论? 这是一个有趣的问题。

Problem of Definition

定义问题

A definition suffers from three types of problems they are defect (or narrow), over-application, and impossible (mismatch) (Borrowed from Indian Philosophy). The debate of Mathematics Specialist and Data Scientist is all about definition. The term ‘Data Scientist’ appears in a Job description for various job roles. Still, the title is Data Scientist, and we search for a person who can do everything.

定义存在三种类型的问题,即缺陷(或狭窄),应用过度和不可能(不匹配)(来自印度哲学)。 数学专家和数据科学家的辩论都与定义有关。 术语“数据科学家”出现在职位描述中,表示各种职位。 标题仍然是数据科学家,我们正在寻找一个可以做所有事情的人。

KDD2020 had an exciting session on Training Data Scientists of the Future. Eminent personalities in the area, such as Thomas Davenport, Usama, and Keith, were leading the discussion. One of the suggestions from Davenport was;

KDD2020举办了关于未来培训数据科学家的精彩会议。 该地区的知名人士,例如托马斯·达文波特(Thomas Davenport),乌萨马(Usama)和基思(Keith),正在主持讨论。 达文波特的建议之一是:

“They should circulate a draft list of job types, ask for commentary, and then finalize the list. Then ask those who practice each job what the necessary skills are. Again, send out a draft list, ask for comments, and finalize the skill list too.”

“他们应该散发一份工作类型清单草案,征求评论意见,然后最终确定清单。 然后问那些从事每项工作的人必备的技能是什么。 再次,发送一份草稿清单,征求意见,并最终确定技能清单。”

I would say Davenport was spot on point. There are thousands of recruitments Job Description out on the internet. Most of them are trying to find the purple squirrel or the unicorn in Data Science and Machine Learning. Lack of uniformity in the JD with-in industry and within the same organization is a significant gap in Data Science, Machine Learning, and AI recruitment. What we need is a rule of thumb to write a JD based on what we are going to achieve. Let’s discuss this in detail later.

我会说达文波特是关键点。 互联网上有成千上万的招聘职位描述。 他们中的大多数人都试图在数据科学和机器学习中找到紫色的松鼠或独角兽。 JD嵌入式行业和同一组织内部缺乏统一性,这在数据科学,机器学习和AI招聘方面存在巨大差距。 我们需要的经验法则是根据我们要实现的目标编写JD。 让我们稍后详细讨论。

Changing Industry Patterns

改变行业格局

Well, what is the relation to Mathematics and JD? The role of the Data Scientist evolved over a period of time. It is almost ten years since the term Data Science started appearing in JD. From 2010 to date, many technologies evolved, died, and resurrected. From sklearn and ‘R’ to Mahout to Spark to H20 and TensorFlow and ocean of frameworks. There was a time (pre-2010) NLTK was the only Natural Language Processing framework in Python (yes! we had MontyLingua RIP). Perl was a swiss army knife for many NLP tasks to start with. Above all, theoretical advances, including Deep Learning and Reinforcement Learning, is commendable. Early 2000’s when we used to go for Computer Science faculty development programs; we mention theoretical aspects of RL. Now students in the same college will show RL demo with OpenAI Gym! That is about change in technology and learning.

那么,与数学和法学博士有什么关系? 数据科学家的角色在一段时间内得到了发展。 从“数据科学”这个术语开始出现在京东以来已经有近十年了。 从2010年至今,许多技术得到了发展,死亡和复活。 从sklearn和'R'到Mahout到Spark到H20和TensorFlow以及众多的框架。 曾经有一段时间(2010年前),NLTK是Python中唯一的自然语言处理框架(是的,我们有MontyLingua RIP)。 Perl是瑞士军刀,可以完成许多NLP任务。 首先,值得赞扬的是包括深度学习和强化学习在内的理论进步。 2000年代初,我们曾经参加计算机科学系的教师发展计划。 我们提到了RL的理论方面。 现在,同一所大学的学生将使用OpenAI Gym展示RL演示! 那是关于技术和学习的变化。

What a data scientist does in an enterprise changed a lot too. The nature of use-cases, the volume of data awareness about need, and the ROI of Data Science problems increased. Project objectives are very focused on the enterprise. AI/Ml and Data Science adoption are attaining maturity level in most of the companies, beyond adjusting to hype circle.

数据科学家在企业中所做的工作也发生了很大变化。 用例的性质,对需求的数据意识的数量以及数据科学问题的ROI都在增加。 项目目标非常关注企业。 除了适应炒作圈,大多数公司都采用AI / Ml和Data Science来达到成熟水平。

The missing piece in this game is the categorization of Job Roles and expectations. The job of a Machine Learning Scientist is different from a Data Scientist, and it different from Machine Learning Engineer. Hence one size fit for all JD’s is no more relevant. The question is who is a Machine Learning Scientist, Data Scientist, and Machine Learning Engineer (there are more titles to add).

该游戏中缺少的部分是工作角色和期望的分类。 机器学习科学家的工作不同于数据科学家,它也不同于机器学习工程师。 因此,适合所有JD的一种尺寸不再重要。 问题是谁是机器学习科学家,数据科学家和机器学习工程师(还有更多标题要添加)。

Who is Who?

谁是谁?

A Machine Learning Scientist is one who designs new algorithms (maybe based on existing algorithms) to solve a specific problem or a set of issues in general. What is expected is the ability to formulate a hypothesis walk to prove the same in a very scientific method and implement it (maybe expectation may go beyond the same). Sometimes the persona will be responsible for implementing the theory and bring a new framework or system. To understand how this looks like is think that you are going to work for the core TensorFlow, PyTorch, or Watson team. The job is not to perform API mashup from only existing libraries. In such a position, knowledge in programming, Mathematics, and Machine Learning is very critical. Some of the companies call this role as an Algorithm Developer (AI/ML/DL…). When hiring for such a position, the HR concept of Purple Squirrel may be relevant. Training skills and background are essential. Most of the time, experience may not be a blocker for such roles for the right candidate.

机器学习科学家是设计新算法(可能基于现有算法)以解决特定问题或一般问题的人。 可以预期的是,能够以一种非常科学的方法制定假设步伐以证明相同并得以实施(也许期望可能会超出相同范围)。 有时,角色将负责实施理论并带来新的框架或系统。 要了解这种情况,请考虑为TensorFlow,PyTorch或Watson核心团队工作。 这项工作不是仅从现有库中执行API混搭。 在这种情况下,编程,数学和机器学习方面的知识至关重要。 一些公司将此角色称为算法开发人员(AI / ML / DL…)。 在招聘此类职位时,紫色松鼠的人力资源概念可能很重要。 培训技能和背景至关重要。 在大多数情况下,经验对于合适的候选人而言可能不是阻碍。

The Data Scientist’s role in the enterprise is to solve a given problem with existing algorithms. Starting from the Business Understanding to handover the model to production (AIOps) will be the range of typical responsibility. Everybody will be searching for unicorns in this space because the end to end Data Science is the expectation. Successful enterprises focus on hiring people who can build models and be creative in the data. For all the practical purposes, such a person should be tagged along with a Data Engineer and functional specialist. Such a structure adds the burden of an additional role as a Project Manager. Still, it has long term benefits. We will discuss the strategy bit later; let’s get into the JD writing for Data Scientist.

数据科学家在企业中的作用是使用现有算法解决给定的问题。 从业务理解开始,将模型移交给生产(AIOps),将是典型职责范围。 每个人都将在这个领域中寻找独角兽,因为端到端数据科学是人们的期望。 成功的企业专注于雇用可以建立模型并在数据中发挥创造力的人员。 出于所有实际目的,应该与数据工程师和职能专家一起对此类人员进行标记。 这样的结构增加了作为项目经理的额外角色的负担。 不过,它具有长期利益。 我们将在稍后讨论该策略。 让我们开始为数据科学家撰写JD文章。

The rule of thumb is not to expect a Unicorn ;-). If you are looking for a short term staff, think about what your team would like to achieve in the foreseeable horizon — problem statements in hand type of data what is expected from the data scientist: current and expected technology stacks and level of experience desired. All of these points will help you to draft a clear JD. Such a JD makes the life of a recruiting agent much simple. For the long term staff (probably hiring a fulltime staff), one should do some groundwork.

经验法则是不要期望独角兽;-)。 如果您正在寻找短期人员,请考虑您的团队在可预见的范围内希望实现的目标-数据类型中的问题陈述,数据科学家的期望:当前和期望的技术堆栈以及所需的经验水平。 所有这些要点将帮助您起草清晰的JD。 这样的JD使招聘代理的工作变得非常简单。 对于长期员工(可能雇用全职员工),应该做一些基础工作。

For long term hire, there are many things to consider. First of all, ask the question of what is the organization’s AI vision for the next three years. Well, if you don’t see one for the organization, try to create one for the team/business unit first review and finalize it. (Sometimes it is better to hire a consultant to assess and recommend a strategy). When hiring for individual teams, we may expect the candidate to know the domain and experience or knowledge in Data Science. Determine the domain knowledge requirements; if you have functional experts in the team, you may be able to relax this. (There are many domains where specific experience is challenging to achieve without working in the industry. ) What problems we need to solve in the next one/two/three years. To solve such a situation, what kind of algorithms may be helpful. Are you interested in swimming in the algorithm framework wave? Answer to these questions will help you to narrow down algorithm level expectations. Now it is time for technology frameworks; here we will decide R or Python or Spark, etc..

对于长期雇用,有很多事情要考虑。 首先,提出以下问题:组织对未来三年的AI愿景是什么? 好吧,如果您看不到该组织的一个,请尝试为团队/业务部门创建一个,然后首先进行审核并完成。 (有时最好聘请顾问来评估和推荐策略)。 在聘请单个团队时,我们可能希望候选人知道数据科学领域和经验或知识。 确定领域知识要求; 如果团队中有职能专家,则可以放松一下。 (在许多领域中,没有行业工作就很难获得特定的经验。)在接下来的一,二,三年中,我们需要解决哪些问题。 为了解决这种情况,哪种算法可能会有所帮助。 您对算法框架浪潮感兴趣吗? 回答这些问题将帮助您缩小算法级别的期望范围。 现在是时候建立技术框架了。 在这里,我们将决定R或Python或Spark等。

Last but not least, the skills to explore the data is essential. One can generalize or specify technologies in this part. It is better not to expect a candidate to be hands-on Natural Language Processing and Time Seris at the same time. What we are looking for here is clear and measurable JD concerning skill, knowledge, and experience.

最后但并非最不重要的一点是,探索数据的技能至关重要。 可以在这一部分中概括或指定技术。 最好不要期望候选人同时进行自然语言处理和时间序列。 我们在这里寻找的是关于技能,知识和经验的清晰且可衡量的JD。

In a Data Scientists role, problem to solution plays an important role. A person who understands statistics and linear algebra, trained in Machilearning and Data Science, should work. The ability to systematically approach data and drive the desired business outcome is the primary goal in this role. Bringing a new algorithm is often not an objective at all. Here the mathematician part will get diluted.

在数据科学家角色中,解决问题起着重要作用。 受过Machilearning和Data Science培训的理解统计和线性代数的人应该工作。 系统地处理数据并驱动所需业务成果的能力是此角色的主要目标。 提出新算法通常根本不是目标。 在这里,数学家部分将被稀释。

One can argue! Shouldn’t we still refer them as Data Miner or Analytics Professional? That is an excellent question with debatable answers?

有人可以争辩! 我们是否应该仍将其称为Data Miner或Analytics Professional? 这是一个值得商bat的好问题?

AI/ML and Data Science platforms, API’s and AutoML are hot topics and trends in the industry. The trend contributed to a new set of roles for AI/ML/Data Science Journey. Machine Learning Engineer, AI (….) Developer and AIOps Engineer, etc.. Will discuss these roles in detail in a separate note. Now the story from the hiring manager standpoint is whom you are hiring? Depending on the same, you may need a strong mathematician or not. For an aspirant, what role you are fit for is essential to select a learning path. It is time for all AI/ML/Data Science course owners to publish skills it attempts to develop. Skills developed by a course and skills expected by a prospective role/employer is comparable.

AI / ML和数据科学平台,API和AutoML是业界的热门话题和趋势。 这种趋势推动了AI / ML /数据科学之旅的一系列新角色。 机器学习工程师,AI(…。)开发人员和AIOps工程师等。将在单独的注释中详细讨论这些角色。 现在,从招聘经理的角度来看,您正在招聘谁? 取决于两者,您可能需要或不需要强大的数学家。 对于有抱负的人,适合自己的角色对于选择学习路径至关重要。 现在是所有AI / ML /数据科学课程所有者发布其尝试开发的技能的时候了。 一门课程开发的技能和预期角色/雇主所期望的技能是可比的。

To be contd….

待续...。

Image for post

#ai #aiml #datascience #machinelearning

#ai #aiml#数据科学#机器学习

翻译自: https://medium.com/the-innovation/dsunicorns-8fa01b1de79

黑客独角兽

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389980.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

38. 外观数列

38. 外观数列 给定一个正整数 n ,输出外观数列的第 n 项。 「外观数列」是一个整数序列,从数字 1 开始,序列中的每一项都是对前一项的描述。 你可以将其视作是由递归公式定义的数字字符串序列: countAndSay(1) “1”countAnd…

JavaScript进阶(一)--执行上下文

在下工科生一枚,自认为文笔烂大街!本着总结JavaScript原理知识,提升自我写作水平的目的,提笔写下这几篇文章,喷子们高抬贵手?。写作过程中本系列过程中,我会尽快写完全部内容,再回过头来优化补…

Lab1

1.导入 JUnit,Hamcrest Project -> Properites -> Java Build Path -> Add External JARs 2. 安装 Eclemma Help -> Eclipse marketplace 搜索 Eclemma,点击Installed 3. 测试代码 TrianglePractice: public class TrianglePract…

抽象类细分举行_什么是用聚类技术聚类的客户细分

抽象类细分举行This content was originally posted in Spanish here https://blogs.solidq.com/es/poder-del-dato/que-es-el-clustering-segmenta-a-tus-clientes-con-machine-learning/此内容最初以西班牙语发布在此处https://blogs.solidq.com/es/poder-del-dato/que-es-el…

551. Student Attendance Record I 从字符串判断学生考勤

[抄题]: You are given a string representing an attendance record for a student. The record only contains the following three characters: A : Absent. L : Late.P : Present. A student could be rewarded if his attendance record…

使用deploy命令上传jar到私有仓库

打开cmd命令提示符,mvn install是将jar包安装到本地库,mvn deploy是将jar包上传到远程server,install和deploy都会先自行bulid编译检查,如果确认jar包没有问题,可以使用-Dmaven.test.skiptrue参数跳过编译和测试。 全命…

282. 给表达式添加运算符

282. 给表达式添加运算符 给定一个仅包含数字 0-9 的字符串 num 和一个目标值整数 target ,在 num 的数字之间添加 二元 运算符(不是一元)、- 或 * ,返回所有能够得到目标值的表达式。 示例 1:输入: num "123", targ…

java 在底图上绘制线条_使用底图和geonamescache绘制k表示聚类

java 在底图上绘制线条This is the third of four stories that aim to address the issue of identifying disease outbreaks by extracting news headlines from popular news sources.这是四个故事中的第三个,旨在通过从流行新闻来源中提取新闻头条来解决识别疾病…

python selenium处理JS只读(12306)

12306为例 js "document.getElementById(train_date).removeAttribute(readonly);" driver.execute_script(js)time2获取当前时间tomorrow_time 获取明天时间 from selenium import webdriver import time import datetime time1datetime.datetime.now().strftime(&…

Mac上使用Jenv管理多个JDK版本

使用Java时会接触到不同的版本。大多数时候我在使用Java 8,但是因为某些框架或是工具的要求,这时不得不让Java 7上前线。一般情况下是配置JAVA_HOME,指定不同的Java版本,但是这需要人为手动的输入。如果又要选择其他版本&#xff…

交互式和非交互式_发布交互式剧情

交互式和非交互式Python中的Visual EDA (Visual EDA in Python) I like to learn about different tools and technologies that are available to accomplish a task. When I decided to explore data regarding COVID-19 (Coronavirus), I knew that I would want the abilit…

5886. 如果相邻两个颜色均相同则删除当前颜色

5886. 如果相邻两个颜色均相同则删除当前颜色 总共有 n 个颜色片段排成一列,每个颜色片段要么是 ‘A’ 要么是 ‘B’ 。给你一个长度为 n 的字符串 colors ,其中 colors[i] 表示第 i 个颜色片段的颜色。 Alice 和 Bob 在玩一个游戏,他们 轮…

Sunisoft.IrisSkin.SkinEngine 设置winform皮肤

Sunisoft.IrisSkin.SkinEngine se; se new Sunisoft.IrisSkin.SkinEngine { SkinAllForm true, SkinFile "..\..\skin\EmeraldColor2.ssk" };Sunisoft.IrisSkin.SkinEngine skin new Sunisoft.IrisSkin.SkinEngine(); //具体样式文件 地址,可以自行修…

docker 相关操作

docker-compose down //关闭所有容器 docker-compose up //开启所有容器docker-compose restart //重启所有容器单独更新某个容器时用脚本$ docker ps // 查看所有正在运行容器 $ docker stop containerId // containerId 是容器的ID$ docker ps -a // 查看所有容器 $…

电子表格转换成数据库_创建数据库,将电子表格转换为关系数据库,第1部分...

电子表格转换成数据库Part 1: Creating an Entity Relational Diagram (ERD)第1部分:创建实体关系图(ERD) A Relational Database Management System (RDMS) is a program that allows us to create, update, and manage a relational database. Structured Query …

【Vue.js学习】生命周期及数据绑定

一、生命后期 官网的图片说明: Vue的生命周期总结 var app new Vue({el:"#app", beforeCreate: function(){console.log(1-beforeCreate 初始化之前);//加载loading},created: function(){console.log(2-created 创建完成);//关闭loading},be…

5885. 使每位学生都有座位的最少移动次数

5885. 使每位学生都有座位的最少移动次数 一个房间里有 n 个座位和 n 名学生,房间用一个数轴表示。给你一个长度为 n 的数组 seats ,其中 seats[i] 是第 i 个座位的位置。同时给你一个长度为 n 的数组 students ,其中 students[j] 是第 j 位…

Springboot(2.0.0.RELEASE)+spark(2.1.0)框架整合到jar包成功发布(原创)!!!

一、前言 首先说明一下,这个框架的整合可能对大神来说十分容易,但是对我来说十分不易,踩了不少坑。虽然整合的时间不长,但是值得来纪念下!!!我个人开发工具比较喜欢IDEA,创建的sprin…

求一个张量的梯度_张量流中离散策略梯度的最小工作示例2 0

求一个张量的梯度Training discrete actor networks with TensorFlow 2.0 is easy once you know how to do it, but also rather different from implementations in TensorFlow 1.0. As the 2.0 version was only released in September 2019, most examples that circulate …

docker环境 快速使用elasticsearch-head插件

docker环境 快速使用elasticsearch-head插件 #elasticsearch配置 #进入elk容器 docker exec -it elk /bin/bash #head插件访问配置 echo #head插件访问# http.cors.enabled: true http.cors.allow-origin: "*" >>/etc/elasticsearch/elasticsearch.yml#重启el…