数据科学与大数据技术的案例
I’ve been in that situation where I got a bunch of data science case studies from different companies and I had to figure out what the problem was, what to do to solve it and what to focus on. Conversely, I’ve also designed case studies for data science and analytics positions, sent them out to candidates and evaluated the submissions I had received.
在这种情况下,我收到了来自不同公司的大量数据科学案例研究,因此我不得不弄清楚问题出在哪里,如何解决以及关注什么。 相反,我还为数据科学和分析职位设计了案例研究,将其发送给候选人并评估了我收到的意见书。
Based on this experience and conversations with others (both candidates and hiring managers) I want to address some typical questions around case studies and explain what they’re good for. I will do this by outlining a set of common expectations from hiring managers. I believe that if you understand the hiring manager’s needs for the case study, you as a candidate will know what to focus on in order to leave a great impression during the process.
基于这种经验以及与其他人(候选人和招聘经理)的交谈,我想解决一些与案例研究有关的典型问题,并说明它们的优点。 我将概述招聘经理的一系列共同期望。 我相信,如果您了解招聘经理对案例研究的需求,作为候选人,您将知道该关注什么,以便在此过程中留下深刻的印象。
进行案例研究 (Making a case for case studies)
Let’s take a look at the purpose of the application process from the hiring company’s perspective: They typically have a problem or a set of problems in their business that they would like someone to solve for them, and they are trying to find out if you could do it. There is no certain way to find this out before actually hiring you to do the job. So what’s the next best thing they can do? Yes, you’ve guessed it: They just ask you to solve their problem in a case study (also often referred to as technical assessment, take-home assessment, technical homework etc.)
让我们从招聘公司的角度看一下申请流程的目的:他们通常在业务中遇到一个问题或一系列问题,希望有人为他们解决,并且他们试图找出您是否可以做吧。 在实际雇用您从事这项工作之前,没有确定的方法可以找到答案。 那么他们能做的下一件最好的事情是什么? 是的,您已经猜到了:他们只是要求您在案例研究中解决问题(通常也称为技术评估,实地评估,技术作业等)。
This seems rather sneaky, you might think. Aren’t I just giving away my time for free for working on a problem that their employees get paid for? Even if that’s true, it is still the best thing that can happen to you in an application.
您可能会想,这似乎是偷偷摸摸的。 我不是只是为了解决员工薪水高昂的问题而浪费时间吗? 即使是真的,这仍然是应用程序中可能发生的最好的事情 。
If the case study indeed reflects the actual work done in the company, this is the single best insight into the kind of problems and the kind of data that you will be working with once hired. Usually, there is also a follow-up interview where you discuss your solution with the data scientists in the company and you hear how they reason about the problem and the solution space. This is great because you can also get the additional benefit of learning about their progress at the given problem.
如果案例研究确实反映了公司的实际工作,那么这是对问题类型和一旦被雇用将要使用的数据类型的唯一最佳见解。 通常,还会有一次后续采访,您在采访中与公司中的数据科学家讨论解决方案,并听到他们如何解决问题和解决方案空间的问题。 这很棒,因为您还可以获得了解他们在给定问题上的进步的额外好处。
Don’t forget: An interview can be a pleasant conversation between data scientists about an interesting problem.
别忘了:采访可以使数据科学家之间就一个有趣的问题进行愉快的交谈。
Why is this so valuable for you? The application process is also supposed to help you to figure out if you want to work for the company and if their challenges are interesting for you. Especially when you’ve already had some work experience and/or you have already figured out exactly what you want for your next role, the maturity level of the company might be a crucial piece of information for your decision making. Yes, I am saying that you should also evaluate your interviewers and pay attention to what they tell you about how they would approach the problem.
为什么这对您如此有价值? 申请流程还应该帮助您确定是否要为公司工作以及他们面临的挑战对您来说是否有趣。 尤其是当您已经有一定的工作经验和/或已经明确要出任下一个职务时,公司的成熟度可能是您决策的关键信息。 是的,我是说您还应该评估您的面试官,并注意他们告诉您的有关他们如何解决问题的信息。
减少招聘的烦恼 (Minimise the pains of hiring)
Hiring someone who is good at their job is expensive. But hiring someone who really sucks at their job is even more expensive. Ask anyone who has made a bad hire before and needed to take some drastic actions, how difficult and nerve-racking the process was. Not to mention the impact on the team that a bad hire can have. Just think of a bad collaboration you’ve had in the past and now imagine you have to work with this person day in, day out, for the next 2 years. And now imagine several people in the team feel the same as you do. What would that do for the team morale?
雇用一个工作出色的人是昂贵的。 但是,聘请真正精干自己工作的人甚至更昂贵。 询问曾经做过不好工作并需要采取一些严厉措施的人,这个过程有多么困难和令人不安。 更不用说糟糕的录用对团队的影响。 试想一下您过去的糟糕合作,现在想像一下,在接下来的两年中,您必须日复一日地与这个人合作。 现在想象一下团队中的几个人与您的感觉相同。 这对团队士气有什么作用?
Therefore, as a hiring manager my goal is to minimise the chance of hiring a bad match for my team.
因此,作为一名招聘经理,我的目标是最大程度地减少为我的团队招募不满意的人的机会 。
If I take this as an objective, we can start working our way backwards: What would I need to know (and what can I reasonably find out during an interview process) in order to minimise that risk? Out of the things that I need to find out during the interview process, which are the ones that I can more naturally check with a case study? Are there perhaps some things that a work sample from a candidate can tell me more about than just asking some direct questions in an interview?
如果我以此为目标,我们可以开始倒退:为了最小化这种风险,我需要知道什么(在面试过程中可以合理地找到什么)? 在面试过程中需要找出的东西中,哪些可以通过案例研究更自然地检查出来? 候选人的工作样本中也许有一些事情可以告诉我更多的信息,而不仅仅是在面试中问一些直接的问题?
Following this reasoning I formulated three questions that a hiring manager is typically trying to answer through a case study. Answering these questions will allow them to build up a mental model of the candidate and their fit to the role. Let these questions guide your attention when you are tackling a case study as a candidate. If you make an impression there, you know it will be noticed.
根据这种推理,我提出了三个问题,招聘经理通常会通过案例研究来回答这些问题。 回答这些问题将使他们能够建立候选人的心理模型,并使其适应职位。 当您进行个案研究时,让这些问题引导您的注意力。 如果您在该处留下印象,就会知道它会被注意到。
问题1:您如何将您的想法和工具应用于我们的业务问题? (Q1: How well can you apply your thinking and your tools to our business problem?)
While in theory it might make sense to try to hire someone who has done the exact same job that I’m hiring for before, in practice this might be more difficult (small candidate pool, and retaining this employee might be difficult as they can easily get bored). Therefore, hiring managers usually need to expand their candidate pool, and they need to assess candidates whose experience might only be remotely related to what the job demands. So a thorough hiring manager will try to create a process that gives them several ways to probe the candidate’s fit for the job’s (skill) requirements. This is what the case study does (among other things).
虽然从理论上讲,尝试聘用与我之前从事过的工作完全相同的人可能很有意义,但实际上这可能会更加困难(候选人人数少,并且留住这名员工可能很困难,因为他们很容易感到厌倦)。 因此,招聘经理通常需要扩展他们的候选人库,并且他们需要评估其经验可能仅与工作要求密切相关的候选人。 因此,一个彻底的招聘经理将尝试创建一个流程,为他们提供几种方法来探究候选人对职位(技能)要求的适合程度。 案例研究就是这样做的(除其他外)。
If you interview for a role in a different industry or domain, don’t underestimate the value that you can add as a ‘newbie’. I dare to infer from the diversity of typical data science teams that most data science skills are transferable across industries. Sure, whenever you make a switch, there is a lot of domain knowledge that you need to pick up from scratch. But the analysis and modelling tools that you have mastered in one job/industry can still be useful in another job/industry. In fact, they might even be a bigger asset because nobody in the existing team has ever tried your toolset on the problem. You just need to find out how your tools might apply and what are their limitations in the new setup.
如果您面试不同行业或领域中的角色, 请不要低估可以作为“新手”添加的价值 。 我敢于从典型的数据科学团队的多样性中推断出,大多数数据科学技能可以跨行业转移。 当然,每当进行切换时,都需要从头开始学习很多领域知识。 但是,您在一个工作/行业中掌握的分析和建模工具仍然可以在另一工作/行业中使用。 实际上,它们甚至可能是更大的资产,因为现有团队中没有人尝试过您的工具集来解决这个问题。 您只需要了解您的工具如何应用以及它们在新设置中的局限性。
One of the best and most thorough submissions I ever received as a hiring manager came from a social psychology graduate. Let’s call her Jane.
作为一名招聘经理,我收到的最好,最彻底的意见之一是社会心理学专业的毕业生。 我们叫她简。
In the case study’s dataset we included a feature that one could consider using as labels in order to formulate a supervised learning task out of the problem statement. But if you would look more closely, you would realise that this feature contains too many missing values, and even worse, the presence of that feature is biased. So if you were to just train a model to predict this feature, you would train a biased model.
在案例研究的数据集中,我们包含了一项功能,可以考虑将其用作标签,以便根据问题陈述来制定监督学习任务。 但是,如果仔细观察,您会发现此功能包含太多的缺失值,更糟糕的是,该功能的存在存在偏差。 因此,如果仅训练模型来预测此功能,则将训练有偏差的模型。
Jane had gained a good enough understanding of our business domain through her own research and realised that bias in the data could be a problem. So she tested for interaction effects of the presence of this feature with some other features in the dataset. By doing this she could confidently conclude that there is in fact a bias. So she made the call to not use it as a label.
简通过自己的研究已经对我们的业务领域有了足够的了解,并意识到数据中的偏差可能是一个问题。 因此,她测试了此功能与数据集中其他功能的交互作用。 通过这样做,她可以自信地得出结论,实际上存在偏见。 因此她打了电话,不要将其用作标签。
While Jane doesn’t have a typical data science experience, she had analysed tons of experiment data in her past academic experience. She knows what tools she can use to tease out information from an unknown dataset and how to draw conclusions from it. By familiarising herself with our product she could also come up with reasonable hypotheses to test. In the end, Jane actually ended up implementing a solution that came very close to our own solution at the time, using an unsupervised technique. Needless to say, we extended an offer to her.
尽管Jane没有典型的数据科学经验,但她在过去的学术经验中曾分析过大量的实验数据。 她知道自己可以使用哪些工具从未知数据集中获取信息,以及如何从中得出结论。 通过使自己熟悉我们的产品,她还可以提出合理的假设进行测试。 最后,Jane实际上使用一种无监督的技术最终实现了一个非常接近我们自己的解决方案的解决方案。 不用说,我们向她提供了报价。
Do it like Jane.
像简一样。
Do your homework before you come up with a suitable method for solving the problem. If you have done your homework well, it will also be easy for you to be on top of your results and to explain why you chose your method (see Q3 below).
在找到解决问题的合适方法之前,请先做作业 。 如果您的作业做得不错,那么您也很容易掌握结果并解释为什么选择方法(请参阅下面的第3季度)。
问题2:您如何在不完善的信息下采取行动? (Q2: How will you act under imperfect information?)
It is very rarely (dare I say: never) the case that you will receive a case study that has a full specification of the problem and what needs to be done. And there is, of course, a very good reason for that: In the real world, there is also very rarely (again, probably never) a full specification of the problem at hand. Often it is your job to identify what information is missing, devise a plan of how to actively fill those knowledge gaps, or at least how to manage the uncertainty that comes with knowledge gaps. This usually also requires you to prioritise where to dig deeper and which gaps to skim over (for the time being).
很少(敢于我说:从不)您会收到一个案例研究,其中包含有关问题和需要完成的操作的完整说明。 当然,这有一个很好的理由:在现实世界中,也很少(再也可能永远没有)对即将出现的问题进行全面说明。 通常,您的工作是确定丢失了哪些信息,制定一个计划来积极填补这些知识空白,或者至少如何管理知识空白带来的不确定性。 通常,这还要求您优先考虑在何处进行更深入的挖掘以及暂时跳过哪些差距。
I got a case study as a candidate once where I was told to design ‘a data science solution’ to ‘optimise […] operations’. This was literally the instruction I got. No hint about what ‘data science solution’ means. No hint about what to optimise entails. Not even any details about how the current process works. Full freedom.
曾经有一个案例研究作为候选人,我被告知要设计“数据科学解决方案”以“优化[…]运营”。 这实际上是我得到的指示。 没有暗示“数据科学解决方案”的含义。 没有关于优化的暗示。 甚至没有关于当前流程如何工作的任何细节。 充分的自由。
I suggest to work your way backwards in these cases: Who are the users of the system and what would success mean for them? How can we formalise this success criterion in one or a few KPIs? Once we know which KPIs we are trying to optimise, brainstorm about how we can get there from where we are at now? Oh, we don’t know where we are at right now? Let’s do some research and make some educated guesses then. Once we have an idea about success, let’s also think about the risks, or the worst-case scenario. Is it captured by some KPI? What can we do to mitigate it?
我建议在以下情况下退后一步:谁是系统的用户,成功对他们意味着什么? 我们如何才能在一个或几个KPI中正式化此成功标准? 一旦知道了我们要优化的KPI,就如何从现在的位置实现目标进行集思广益? 哦,我们不知道我们现在在哪里? 让我们做一些研究,然后做出一些有根据的猜测。 一旦有了关于成功的想法,就让我们考虑一下风险或最坏的情况。 它是否被某些KPI捕获? 我们可以采取什么措施来缓解这种情况?
Each of these questions might require some research, some thinking and some hypothesising. And that is okay, because on the job this process will look quite similar. You might have to scan a lot of internal documentation, interview product owners/stakeholders, or consult external resources to find the answers.
这些问题中的每一个都可能需要一些研究,一些思考和一些假设。 没关系,因为在工作过程中此过程看起来将非常相似。 您可能需要扫描大量内部文档,采访产品所有者/利益相关者或咨询外部资源以找到答案。
You will not get every single assumption or conclusion right for the case study, but that is also never the expectation. Rather, your interviewers want to see that you can come up with a plan to tackle the unknowns while making a reasonable prioritisation. Maybe in the follow-up interview they will correct some of your assumptions. Be prepared to be challenged and to adapt your solution to new information. This is also why doing your homework (see previous section, Q1) can help you to think on your feet in this discussion.
对于案例研究,您将不会获得每个正确的假设或结论,但这绝不是期望。 相反,您的面试官希望看到您可以提出一个计划,以解决未知的问题,同时进行合理的优先排序。 也许在后续采访中,他们将纠正您的一些假设。 准备好迎接挑战,并使您的解决方案适应新的信息。 这也是为什么要做功课(请参阅上一节,第1季度)可以帮助您在此讨论中思考的原因。
问题3:您与团队的合作程度如何? (Q3: How well will you collaborate with the team?)
A question I sometimes hear from data scientists is how much attention they should pay to code quality in their case study submissions. Well, what does good code quality help with?
我有时从数据科学家那里听到的一个问题是,他们在案例研究提交中应该对代码质量给予多大的关注。 那么,良好的代码质量有什么帮助?
It’s about if other people in the team can understand your code, collaborate with you on your code, and maintain your code when you’re on holidays. So ask yourself, can someone else who is seeing your code for the first time understand it easily? Would they be able to work on it without going crazy? If you can’t answer these questions, ask yourself as a proxy: Can I understand this code 3 months or 1 year from now and work on it without hating my past self?
团队中的其他人是否可以理解您的代码,在代码上与您合作以及在休假期间维护代码。 因此,问问自己,第一次看到您的代码的其他人可以轻松理解吗? 他们能够在不发疯的情况下进行工作吗? 如果您无法回答这些问题,请以代理人的身份问自己:我可以从现在开始的3个月或1年后理解此代码,并且在不讨厌自己过去的情况下进行工作吗?
More fundamentally, it is a question of collaboration: Will people in the team enjoy working with you? Will they understand and trust your work? Will they be able to build upon your work (and you on theirs)?
从根本上讲,这是协作的问题 :团队中的人会喜欢和您一起工作吗? 他们会理解并信任您的工作吗? 他们将能够在您的工作(以及您在他们的工作)的基础上发展吗?
The understanding and the trust will depend on another important collaboration related skill (if not the most important) that a hiring manager will check with the case study (and throughout the process). It’s communication.
理解和信任将取决于另外一个重要的协作相关的技能(如果不是最重要的 ),一个招聘经理将与案例研究(以及整个过程中)检查。 是交流 。
As for most case studies there is not one single correct solution, so it’s important that you can explain the approach you took and the reasoning behind it. You can easily imagine situations on the job when a data scientist has to communicate their work to team mates or justify it to stakeholders. Especially with the latter you will probably be more effective if you show some sort of storytelling skills, i.e. your work communicates a clear message in a compelling and easy-to-follow way (e.g. through visualisations, illustrative examples etc.).
对于大多数案例研究,没有一个单一的正确解决方案,因此,重要的是您可以解释所采用的方法及其背后的原因。 您可以轻松想象工作中的情况,即数据科学家必须将其工作传达给队友或将其证明给利益相关者。 特别是对于后者,如果您表现出某种讲故事的技巧,则可能会更有效,例如,您的作品以引人注目的且易于遵循的方式传达清晰的信息(例如,通过可视化,说明性示例等)。
So read carefully who the case study presentation is for and tailor your presentation to this audience. What is the right level of abstraction? Are you prepared to give more details of your solution and your data when needed?4 Can you explain the limitations and the trade-offs of your solution? Does your solution present clear action points / key takeaways? Can you maybe even inspire your audience with a future vision of the solution?
因此,请仔细阅读该案例研究的演讲对象,然后针对此受众量身定制您的演示文稿。 什么是正确的抽象级别? 您是否准备在需要时提供解决方案和数据的更多详细信息? 4您能解释一下解决方案的局限性和权衡吗? 您的解决方案是否提供清晰的操作要点/关键要点? 您甚至可以用该解决方案的未来愿景来激发您的听众吗?
Don’t make the mistake of thinking that it’s enough to just build an unexplainable model that has high accuracy. Don’t underestimate the importance of collaboration skills.
不要误以为仅仅建立一个无法解释的高精度模型就足够了。 不要低估协作技能的重要性。
结论 (Conclusion)
In this article, I have tried to convince you that data science case studies provide an opportunity for candidates to learn a lot about the company and the role. I also shared a common set of expectations that hiring managers have towards case studies, and derived some tips for candidates. I hope these tips help you to decide what to focus on in your next case study.
在本文中,我试图说服您,数据科学案例研究为应聘者提供了一个学习有关公司及其角色的很多知识的机会。 我还对招聘经理对案例研究抱有共同的期望,并为求职者提供了一些技巧。 我希望这些技巧可以帮助您决定下一个案例研究的重点。
However, there might always be some more specific requirements that only the job you’re applying for has. Try out the exercise of putting yourself into the hiring manager’s shoes! If you feel you are not able to because you lack information about the hiring manager’s expectations, maybe it’s a good time to ask for a chat with the hiring manager in order to understand them better.
但是,可能总会有一些更具体的要求,只有您要申请的工作才有。 尝试锻炼自己,使自己适应招聘经理的职责! 如果您由于缺乏有关招聘经理期望的信息而感到无法做到,也许是时候与招聘经理进行聊天以更好地了解他们了。
Best of luck!
祝你好运!
This article was first published on https://mins.space.
该文章最初在https://mins.space上发布。
翻译自: https://towardsdatascience.com/master-data-science-case-studies-a-hiring-managers-perspective-49e508263280
数据科学与大数据技术的案例
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389186.shtml
如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!