黑客独角兽
Preface
前言
Last week my friend and colleague Srivastan Srivsan’s note on LinkedIn about Mathematics and Data Science opened an excellent discussion. Well, it is not something new; there were debates in the tech domain such as vim v.s emacs to many others. The debate about Math and Data science has elevated to new areas every year since 2013. Above all, the industry notion (or confusion) about Unicorn Data Scientist remains as a catalyst to the debate. And the HR is in search of ‘Purple Squirrel.’ Why are we debating? That is an interesting question to ask ourselves.
上周,我的朋友和同事Srivastan Srivsan在LinkedIn上有关数学和数据科学的笔记开始了精彩的讨论。 好吧,这不是新事物。 在技术领域有很多其他的辩论,例如vim vs emacs。 自2013年以来,关于数学和数据科学的争论每年都有新的发展。最重要的是,有关Unicorn Data Scientist的行业概念(或困惑)仍然是辩论的催化剂。 人力资源部正在寻找“紫色松鼠”。 我们为什么要辩论? 这是一个有趣的问题。
Problem of Definition
定义问题
A definition suffers from three types of problems they are defect (or narrow), over-application, and impossible (mismatch) (Borrowed from Indian Philosophy). The debate of Mathematics Specialist and Data Scientist is all about definition. The term ‘Data Scientist’ appears in a Job description for various job roles. Still, the title is Data Scientist, and we search for a person who can do everything.
定义存在三种类型的问题,即缺陷(或狭窄),应用过度和不可能(不匹配)(来自印度哲学)。 数学专家和数据科学家的辩论都与定义有关。 术语“数据科学家”出现在职位描述中,表示各种职位。 标题仍然是数据科学家,我们正在寻找一个可以做所有事情的人。
KDD2020 had an exciting session on Training Data Scientists of the Future. Eminent personalities in the area, such as Thomas Davenport, Usama, and Keith, were leading the discussion. One of the suggestions from Davenport was;
KDD2020举办了关于未来培训数据科学家的精彩会议。 该地区的知名人士,例如托马斯·达文波特(Thomas Davenport),乌萨马(Usama)和基思(Keith),正在主持讨论。 达文波特的建议之一是:
“They should circulate a draft list of job types, ask for commentary, and then finalize the list. Then ask those who practice each job what the necessary skills are. Again, send out a draft list, ask for comments, and finalize the skill list too.”
“他们应该散发一份工作类型清单草案,征求评论意见,然后最终确定清单。 然后问那些从事每项工作的人必备的技能是什么。 再次,发送一份草稿清单,征求意见,并最终确定技能清单。”
I would say Davenport was spot on point. There are thousands of recruitments Job Description out on the internet. Most of them are trying to find the purple squirrel or the unicorn in Data Science and Machine Learning. Lack of uniformity in the JD with-in industry and within the same organization is a significant gap in Data Science, Machine Learning, and AI recruitment. What we need is a rule of thumb to write a JD based on what we are going to achieve. Let’s discuss this in detail later.
我会说达文波特是关键点。 互联网上有成千上万的招聘职位描述。 他们中的大多数人都试图在数据科学和机器学习中找到紫色的松鼠或独角兽。 JD嵌入式行业和同一组织内部缺乏统一性,这在数据科学,机器学习和AI招聘方面存在巨大差距。 我们需要的经验法则是根据我们要实现的目标编写JD。 让我们稍后详细讨论。
Changing Industry Patterns
改变行业格局
Well, what is the relation to Mathematics and JD? The role of the Data Scientist evolved over a period of time. It is almost ten years since the term Data Science started appearing in JD. From 2010 to date, many technologies evolved, died, and resurrected. From sklearn and ‘R’ to Mahout to Spark to H20 and TensorFlow and ocean of frameworks. There was a time (pre-2010) NLTK was the only Natural Language Processing framework in Python (yes! we had MontyLingua RIP). Perl was a swiss army knife for many NLP tasks to start with. Above all, theoretical advances, including Deep Learning and Reinforcement Learning, is commendable. Early 2000’s when we used to go for Computer Science faculty development programs; we mention theoretical aspects of RL. Now students in the same college will show RL demo with OpenAI Gym! That is about change in technology and learning.
那么,与数学和法学博士有什么关系? 数据科学家的角色在一段时间内得到了发展。 从“数据科学”这个术语开始出现在京东以来已经有近十年了。 从2010年至今,许多技术得到了发展,死亡和复活。 从sklearn和'R'到Mahout到Spark到H20和TensorFlow以及众多的框架。 曾经有一段时间(2010年前),NLTK是Python中唯一的自然语言处理框架(是的,我们有MontyLingua RIP)。 Perl是瑞士军刀,可以完成许多NLP任务。 首先,值得赞扬的是包括深度学习和强化学习在内的理论进步。 2000年代初,我们曾经参加计算机科学系的教师发展计划。 我们提到了RL的理论方面。 现在,同一所大学的学生将使用OpenAI Gym展示RL演示! 那是关于技术和学习的变化。
What a data scientist does in an enterprise changed a lot too. The nature of use-cases, the volume of data awareness about need, and the ROI of Data Science problems increased. Project objectives are very focused on the enterprise. AI/Ml and Data Science adoption are attaining maturity level in most of the companies, beyond adjusting to hype circle.
数据科学家在企业中所做的工作也发生了很大变化。 用例的性质,对需求的数据意识的数量以及数据科学问题的ROI都在增加。 项目目标非常关注企业。 除了适应炒作圈,大多数公司都采用AI / Ml和Data Science来达到成熟水平。
The missing piece in this game is the categorization of Job Roles and expectations. The job of a Machine Learning Scientist is different from a Data Scientist, and it different from Machine Learning Engineer. Hence one size fit for all JD’s is no more relevant. The question is who is a Machine Learning Scientist, Data Scientist, and Machine Learning Engineer (there are more titles to add).
该游戏中缺少的部分是工作角色和期望的分类。 机器学习科学家的工作不同于数据科学家,它也不同于机器学习工程师。 因此,适合所有JD的一种尺寸不再重要。 问题是谁是机器学习科学家,数据科学家和机器学习工程师(还有更多标题要添加)。
Who is Who?
谁是谁?
A Machine Learning Scientist is one who designs new algorithms (maybe based on existing algorithms) to solve a specific problem or a set of issues in general. What is expected is the ability to formulate a hypothesis walk to prove the same in a very scientific method and implement it (maybe expectation may go beyond the same). Sometimes the persona will be responsible for implementing the theory and bring a new framework or system. To understand how this looks like is think that you are going to work for the core TensorFlow, PyTorch, or Watson team. The job is not to perform API mashup from only existing libraries. In such a position, knowledge in programming, Mathematics, and Machine Learning is very critical. Some of the companies call this role as an Algorithm Developer (AI/ML/DL…). When hiring for such a position, the HR concept of Purple Squirrel may be relevant. Training skills and background are essential. Most of the time, experience may not be a blocker for such roles for the right candidate.
机器学习科学家是设计新算法(可能基于现有算法)以解决特定问题或一般问题的人。 可以预期的是,能够以一种非常科学的方法制定假设步伐以证明相同并得以实施(也许期望可能会超出相同范围)。 有时,角色将负责实施理论并带来新的框架或系统。 要了解这种情况,请考虑为TensorFlow,PyTorch或Watson核心团队工作。 这项工作不是仅从现有库中执行API混搭。 在这种情况下,编程,数学和机器学习方面的知识至关重要。 一些公司将此角色称为算法开发人员(AI / ML / DL…)。 在招聘此类职位时,紫色松鼠的人力资源概念可能很重要。 培训技能和背景至关重要。 在大多数情况下,经验对于合适的候选人而言可能不是阻碍。
The Data Scientist’s role in the enterprise is to solve a given problem with existing algorithms. Starting from the Business Understanding to handover the model to production (AIOps) will be the range of typical responsibility. Everybody will be searching for unicorns in this space because the end to end Data Science is the expectation. Successful enterprises focus on hiring people who can build models and be creative in the data. For all the practical purposes, such a person should be tagged along with a Data Engineer and functional specialist. Such a structure adds the burden of an additional role as a Project Manager. Still, it has long term benefits. We will discuss the strategy bit later; let’s get into the JD writing for Data Scientist.
数据科学家在企业中的作用是使用现有算法解决给定的问题。 从业务理解开始,将模型移交给生产(AIOps),将是典型职责范围。 每个人都将在这个领域中寻找独角兽,因为端到端数据科学是人们的期望。 成功的企业专注于雇用可以建立模型并在数据中发挥创造力的人员。 出于所有实际目的,应该与数据工程师和职能专家一起对此类人员进行标记。 这样的结构增加了作为项目经理的额外角色的负担。 不过,它具有长期利益。 我们将在稍后讨论该策略。 让我们开始为数据科学家撰写JD文章。
The rule of thumb is not to expect a Unicorn ;-). If you are looking for a short term staff, think about what your team would like to achieve in the foreseeable horizon — problem statements in hand type of data what is expected from the data scientist: current and expected technology stacks and level of experience desired. All of these points will help you to draft a clear JD. Such a JD makes the life of a recruiting agent much simple. For the long term staff (probably hiring a fulltime staff), one should do some groundwork.
经验法则是不要期望独角兽;-)。 如果您正在寻找短期人员,请考虑您的团队在可预见的范围内希望实现的目标-数据类型中的问题陈述,数据科学家的期望:当前和期望的技术堆栈以及所需的经验水平。 所有这些要点将帮助您起草清晰的JD。 这样的JD使招聘代理的工作变得非常简单。 对于长期员工(可能雇用全职员工),应该做一些基础工作。
For long term hire, there are many things to consider. First of all, ask the question of what is the organization’s AI vision for the next three years. Well, if you don’t see one for the organization, try to create one for the team/business unit first review and finalize it. (Sometimes it is better to hire a consultant to assess and recommend a strategy). When hiring for individual teams, we may expect the candidate to know the domain and experience or knowledge in Data Science. Determine the domain knowledge requirements; if you have functional experts in the team, you may be able to relax this. (There are many domains where specific experience is challenging to achieve without working in the industry. ) What problems we need to solve in the next one/two/three years. To solve such a situation, what kind of algorithms may be helpful. Are you interested in swimming in the algorithm framework wave? Answer to these questions will help you to narrow down algorithm level expectations. Now it is time for technology frameworks; here we will decide R or Python or Spark, etc..
对于长期雇用,有很多事情要考虑。 首先,提出以下问题:组织对未来三年的AI愿景是什么? 好吧,如果您看不到该组织的一个,请尝试为团队/业务部门创建一个,然后首先进行审核并完成。 (有时最好聘请顾问来评估和推荐策略)。 在聘请单个团队时,我们可能希望候选人知道数据科学领域和经验或知识。 确定领域知识要求; 如果团队中有职能专家,则可以放松一下。 (在许多领域中,没有行业工作就很难获得特定的经验。)在接下来的一,二,三年中,我们需要解决哪些问题。 为了解决这种情况,哪种算法可能会有所帮助。 您对算法框架浪潮感兴趣吗? 回答这些问题将帮助您缩小算法级别的期望范围。 现在是时候建立技术框架了。 在这里,我们将决定R或Python或Spark等。
Last but not least, the skills to explore the data is essential. One can generalize or specify technologies in this part. It is better not to expect a candidate to be hands-on Natural Language Processing and Time Seris at the same time. What we are looking for here is clear and measurable JD concerning skill, knowledge, and experience.
最后但并非最不重要的一点是,探索数据的技能至关重要。 可以在这一部分中概括或指定技术。 最好不要期望候选人同时进行自然语言处理和时间序列。 我们在这里寻找的是关于技能,知识和经验的清晰且可衡量的JD。
In a Data Scientists role, problem to solution plays an important role. A person who understands statistics and linear algebra, trained in Machilearning and Data Science, should work. The ability to systematically approach data and drive the desired business outcome is the primary goal in this role. Bringing a new algorithm is often not an objective at all. Here the mathematician part will get diluted.
在数据科学家角色中,解决问题起着重要作用。 受过Machilearning和Data Science培训的理解统计和线性代数的人应该工作。 系统地处理数据并驱动所需业务成果的能力是此角色的主要目标。 提出新算法通常根本不是目标。 在这里,数学家部分将被稀释。
One can argue! Shouldn’t we still refer them as Data Miner or Analytics Professional? That is an excellent question with debatable answers?
有人可以争辩! 我们是否应该仍将其称为Data Miner或Analytics Professional? 这是一个值得商bat的好问题?
AI/ML and Data Science platforms, API’s and AutoML are hot topics and trends in the industry. The trend contributed to a new set of roles for AI/ML/Data Science Journey. Machine Learning Engineer, AI (….) Developer and AIOps Engineer, etc.. Will discuss these roles in detail in a separate note. Now the story from the hiring manager standpoint is whom you are hiring? Depending on the same, you may need a strong mathematician or not. For an aspirant, what role you are fit for is essential to select a learning path. It is time for all AI/ML/Data Science course owners to publish skills it attempts to develop. Skills developed by a course and skills expected by a prospective role/employer is comparable.
AI / ML和数据科学平台,API和AutoML是业界的热门话题和趋势。 这种趋势推动了AI / ML /数据科学之旅的一系列新角色。 机器学习工程师,AI(…。)开发人员和AIOps工程师等。将在单独的注释中详细讨论这些角色。 现在,从招聘经理的角度来看,您正在招聘谁? 取决于两者,您可能需要或不需要强大的数学家。 对于有抱负的人,适合自己的角色对于选择学习路径至关重要。 现在是所有AI / ML /数据科学课程所有者发布其尝试开发的技能的时候了。 一门课程开发的技能和预期角色/雇主所期望的技能是可比的。
To be contd….
待续...。
#ai #aiml #datascience #machinelearning
#ai #aiml#数据科学#机器学习
翻译自: https://medium.com/the-innovation/dsunicorns-8fa01b1de79
黑客独角兽
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389980.shtml
如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!