数据暑假实习面试
Unfortunately, on this occasion, your application was not successful, and we have appointed an applicant who…
不幸的是,这一次,您的申请没有成功,我们已经任命了一位符合以下条件的申请人:
Sounds familiar, right? After all of these gruelling hours that I spend on the interview preparation, the rejection came after the rejection. Although I was passing the first few interview stages, it didn’t go that well for me during the face-to-face stages. “What a spectacular failure I am”, I thought.
听起来很熟悉,对不对? 在我花了所有艰苦的时间进行面试准备之后,拒绝就被拒绝了。 尽管我已经通过了前几个面试阶段,但是在面对面阶段对我来说进展并不顺利。 我想:“我是多么的失败。”
I started looking for ways to improve. I’ve identified a few areas that are usually overlooked but can potentially have a huge impact on what will be the interview outcome. This, in turn, helped me to improve and get a job that I wanted to have!
我开始寻找改善的方法。 我已经确定了一些通常被忽略的领域,但它们可能对面试结果产生巨大影响。 反过来,这帮助我改善了工作并获得了想要的工作!
正确掌握基础知识 (Get The Basics Right)
The DS internships are usually quite competitive and any red flag for the recruiter might decide if you are rejected straightaway. One of these red flags is whether your foundations are good enough. Data science is a field where you are required to have good mathematical and programming knowledge.
DS实习生通常竞争激烈,招募人员的任何危险信号都可能决定您是否被直接拒绝。 这些危险信号之一是您的基础是否足够好。 数据科学是一个要求您具有良好数学和编程知识的领域。
How can you improve? For data science theory, I recommend getting a good mathematical understanding of the most common algorithms. There are two books that I usually recommend: Pattern Recognition and Machine Learning, and First Course in Machine Learning. Both of them contain in-depth mathematical explanations of machine learning algorithms which will help you smash DS interview questions to pieces!
您如何改善? 对于数据科学理论,我建议您对最常见的算法有一个很好的数学理解。 我通常推荐两本书: 模式识别和机器学习 ,以及机器学习 第一门课程 。 它们都包含对机器学习算法的深入数学解释,这将帮助您将DS面试问题粉碎成碎片!
Depending on the company, you might be also asked programming questions. They are often not that hard but given the stress and time constraints, you really need to master them as well. You should expect any questions from sorting, recurrence, to data structures. It’s good to start practicing these questions as soon as possible. To get a good understanding of how to approach the coding questions, I recommend going through the Cracking the Coding Interview book. To get more practical experience, visit the Hackerrank, or LeetCode.
根据公司的不同,可能还会询问您编程方面的问题。 它们通常并不难,但是由于压力和时间限制,您确实也需要掌握它们。 您应该期望从排序,重复出现到数据结构的任何问题。 最好尽快开始练习这些问题。 为了更好地理解编码问题,我建议您阅读《 破解编码面试》一书。 要获得更多实践经验,请访问Hackerrank或LeetCode 。
Glassdoor是您最好的朋友 (Glassdoor is Your Best Friend)
You can also get a good feel of what is the company’s culture and atmosphere from the Glassdoor reviews. This can give you a good indication of whether that company is a good fit for you. If, for example, one company seems to have really toxic atmosphere maybe it would be better to withdraw the application and spend more time to prepare for interviews at other companies? What’s the point in interviewing with companies that you don’t really want to work for?
从Glassdoor的评论中,您还可以很好地了解公司的文化和氛围。 这可以很好地表明该公司是否适合您。 例如,如果一家公司似乎真的有毒的气氛,那么最好撤回申请并花更多时间准备在其他公司进行面试是否更好? 面试您真的不想工作的公司有什么意义?
You can also find some really useful information about the interview structure, or about the type of questions they ask. Some companies are literally asking the same set of questions every time! I am not sure why they are doing that, but in this case, you should notice that the questions are being repeated in the Glassdoor reviews. You can take it to your advantage and learn them by heart.
您还可以找到有关面试结构或他们提出的问题类型的一些非常有用的信息。 实际上,有些公司每次都在问同样的问题! 我不确定他们为什么这样做,但是在这种情况下,您应该注意到,Glassdoor审查中重复出现了这些问题。 您可以发挥自己的优势,并认真学习。
容易的面试问题并不容易 (Easy Interview Questions are NOT Easy)
Imagine a situation when the interviewer asks: what’s the linear regression?
想象一下,当面试官问:线性回归是什么?
You can answer either:
您可以回答:
It is a linear approach that models the relationship in data between dependent and independent variables.
这是一种线性方法,可对因变量和自变量之间的数据关系进行建模。
Or:
要么:
It is a linear approach that models the relationship in data between dependent and independent variables. The model’s parameters can be derived using ordinary least squares approach and a general equation works on multi-dimensional data. It is an algorithm that is simple, fast, and interpretable. However, it has certain caveats such as …
这是一种线性方法,可对因变量和自变量之间的数据关系进行建模。 可以使用普通最小二乘法得出模型的参数,并且通用方程适用于多维数据。 它是一种简单,快速且可解释的算法。 但是,它有一些警告,例如……
Do you see what I mean? By asking a simple-looking question, the interviewer can test two things. Firstly, if you have a basic knowledge (obvious). Secondly, it tests what is the depth of your understanding and how inquisitive you are while studying a certain topic. This ability is crucial in the data scientist skillset as you will often have to work with new tools and read research papers. If you don’t analyze the subject thoroughly and fail to understand its limitations and capabilities, it’s a straight path that leads to an unsuccessful project.
你明白我的意思吗? 通过问一个简单的问题,面试官可以测试两件事。 首先,如果您具有基本知识(显而易见)。 其次,它测试您对特定主题的理解的深度和好奇心。 该功能对于数据科学家技能至关重要,因为您经常需要使用新工具并阅读研究论文。 如果您没有对主题进行全面分析,并且不了解主题的局限性和功能,那么这是导致项目失败的直接途径。
展示项目。 质量还是数量? (Showcase Projects. Quality or Quantity?)
TLDR; Quality!
TLDR; 质量!
The painful truth is that nobody cares about the endless Jupyter notebooks that you created for your 100+ mini-projects. Don’t take me wrong: it’s still a great way to experiment with new models and data. But, most likely, it won’t impress the interviewer.
痛苦的事实是,没有人会关心您为100多个迷你项目创建的无尽Jupyter笔记本。 不要误会我的意思:这仍然是尝试新模型和数据的好方法。 但是,很可能不会给面试官留下深刻的印象。
There is much more to data science than just creating dozens of untested machine learning models in a single file. In the real-life scenario, the code needs to be tested, packaged, documented and deployed using internal servers or cloud services.
数据科学不仅仅是在单个文件中创建数十个未经测试的机器学习模型,还具有更多的功能。 在实际场景中,需要使用内部服务器或云服务来测试,打包,记录和部署代码。
My advice? Go for the quality and aim to create ~3 bigger projects that will impress the interviewers. Here are some tips that you can follow:
我的建议? 追求质量 ,目标是创建〜3个更大的项目,这些项目将使访问员印象深刻。 您可以按照以下提示操作:
- Find a real-world dataset that requires a lot of preprocessing and EDA 查找需要大量预处理和EDA的真实数据集
- Make your code modular: create separate classes for models, data preprocessing, and end-to-end pipelines 使代码模块化:为模型,数据预处理和端到端管道创建单独的类
Use test-driven development (TDD) while developing a packaged code
在开发打包的代码时使用测试驱动的开发(TDD)
Work with Git and continuous integration services such as CircleCI
与Git和持续集成服务(例如CircleCI)一起使用
Expose the model’s API to the user, e.g. Flask for Python
向用户公开模型的API,例如Flask for Python
Document the code using Sphinx and adhere to code styling guidelines (e.g. PEP-8 for Python)
使用Sphinx记录代码并遵守代码样式准则(例如,用于Python的PEP-8 )
A really good course on ML model deployment was created by data scientists from Babylon Health and Train In Data at Udemy. You can find it here.
来自于Udemy的Babylon Health和Train In Data的数据科学家创建了关于ML模型部署的非常好的课程。 你可以在这里找到它。
奖励:简历模板 (Bonus: CV Template)
I am a big fan of 1-page CVs for data science internships. It helps me to keep it simple and clear without redundant information. I used to have a Word template in the past, but I was losing a lot of time modifying it. When I was removing or adding some information, the formatting was instantly blowing off making my CV look like the Enigma code 😆
我非常喜欢用于数据科学实习的1页简历。 它可以帮助我在没有多余信息的情况下保持简单明了。 我过去曾经有一个Word模板,但是我浪费了很多时间来修改它。 当我删除或添加一些信息时,格式立即消失,使我的简历看起来像Enigma代码😆
Anyway, I found a nice looking Overleaf CV template that I’ve been using ever since. It is simple, clear, and most importantly, it’s rendered with a modular Latex code that makes formatting a painless task. The link to the CV template is here.
无论如何,我找到了自此以来一直在使用的漂亮的Overleaf CV模板。 它简单,清晰,最重要的是,它使用模块化的Latex代码进行渲染,从而使格式化工作变得轻而易举。 简历模板的链接在这里 。
关于我 (About Me)
I am an MSc Artificial Intelligence student at the University of Amsterdam. In my spare time, you can find me fiddling with data or debugging my deep learning model (I swear it worked!). I also like hiking :)
我是阿姆斯特丹大学的人工智能硕士研究生。 在业余时间,您会发现我不喜欢数据或调试我的深度学习模型(我发誓它能工作!)。 我也喜欢远足:)
Here are my social media profiles, if you want to stay in touch with my latest articles and other useful content:
如果您想与我的最新文章和其他有用内容保持联系,这是我的社交媒体个人资料:
Linkedin
领英
Github
Github
Personal Website
个人网站
翻译自: https://towardsdatascience.com/interviewing-for-data-science-internship-how-to-prepare-f6b9c2c7fa97
数据暑假实习面试
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389666.shtml
如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!