数据暑假实习面试_面试数据科学实习如何准备

数据暑假实习面试

Unfortunately, on this occasion, your application was not successful, and we have appointed an applicant who…

不幸的是,这一次,您的申请没有成功,我们已经任命了一位符合以下条件的申请人:

Sounds familiar, right? After all of these gruelling hours that I spend on the interview preparation, the rejection came after the rejection. Although I was passing the first few interview stages, it didn’t go that well for me during the face-to-face stages. “What a spectacular failure I am”, I thought.

听起来很熟悉,对不对? 在我花了所有艰苦的时间进行面试准备之后,拒绝就被拒绝了。 尽管我已经通过了前几个面试阶段,但是在面对面阶段对我来说进展并不顺利。 我想:“我是多么的失败。”

I started looking for ways to improve. I’ve identified a few areas that are usually overlooked but can potentially have a huge impact on what will be the interview outcome. This, in turn, helped me to improve and get a job that I wanted to have!

我开始寻找改善的方法。 我已经确定了一些通常被忽略的领域,但它们可能对面试结果产生巨大影响。 反过来,这帮助我改善了工作并获得了想要的工作!

正确掌握基础知识 (Get The Basics Right)

Image for post
Photo by Clay Banks on Unsplash
Clay Banks在Unsplash上拍摄的照片

The DS internships are usually quite competitive and any red flag for the recruiter might decide if you are rejected straightaway. One of these red flags is whether your foundations are good enough. Data science is a field where you are required to have good mathematical and programming knowledge.

DS实习生通常竞争激烈,招募人员的任何危险信号都可能决定您是否被直接拒绝。 这些危险信号之一是您的基础是否足够好。 数据科学是一个要求您具有良好数学和编程知识的领域。

How can you improve? For data science theory, I recommend getting a good mathematical understanding of the most common algorithms. There are two books that I usually recommend: Pattern Recognition and Machine Learning, and First Course in Machine Learning. Both of them contain in-depth mathematical explanations of machine learning algorithms which will help you smash DS interview questions to pieces!

您如何改善? 对于数据科学理论,我建议您对最常见的算法有一个很好的数学理解。 我通常推荐两本书: 模式识别和机器学习 ,以及机器学习 第一门课程 。 它们都包含对机器学习算法的深入数学解释,这将帮助您将DS面试问题粉碎成碎片!

Depending on the company, you might be also asked programming questions. They are often not that hard but given the stress and time constraints, you really need to master them as well. You should expect any questions from sorting, recurrence, to data structures. It’s good to start practicing these questions as soon as possible. To get a good understanding of how to approach the coding questions, I recommend going through the Cracking the Coding Interview book. To get more practical experience, visit the Hackerrank, or LeetCode.

根据公司的不同,可能还会询问您编程方面的问题。 它们通常并不难,但是由于压力和时间限制,您确实也需要掌握它们。 您应该期望从排序,重复出现到数据结构的任何问题。 最好尽快开始练习这些问题。 为了更好地理解编码问题,我建议您阅读《 破解编码面试》一书。 要获得更多实践经验,请访问HackerrankLeetCode

Glassdoor是您最好的朋友 (Glassdoor is Your Best Friend)

You can also get a good feel of what is the company’s culture and atmosphere from the Glassdoor reviews. This can give you a good indication of whether that company is a good fit for you. If, for example, one company seems to have really toxic atmosphere maybe it would be better to withdraw the application and spend more time to prepare for interviews at other companies? What’s the point in interviewing with companies that you don’t really want to work for?

从Glassdoor的评论中,您还可以很好地了解公司的文化和氛围。 这可以很好地表明该公司是否适合您。 例如,如果一家公司似乎真的有毒的气氛,那么最好撤回申请并花更多时间准备在其他公司进行面试是否更好? 面试您真的不想工作的公司有什么意义?

You can also find some really useful information about the interview structure, or about the type of questions they ask. Some companies are literally asking the same set of questions every time! I am not sure why they are doing that, but in this case, you should notice that the questions are being repeated in the Glassdoor reviews. You can take it to your advantage and learn them by heart.

您还可以找到有关面试结构或他们提出的问题类型的一些非常有用的信息。 实际上,有些公司每次都在问同样的问题! 我不确定他们为什么这样做,但是在这种情况下,您应该注意到,Glassdoor审查中重复出现了这些问题。 您可以发挥自己的优势,并认真学习。

容易的面试问题并不容易 (Easy Interview Questions are NOT Easy)

Image for post
Photo by Jules Bss on Unsplash
Jules Bss在Unsplash上拍摄的照片

Imagine a situation when the interviewer asks: what’s the linear regression?

想象一下,当面试官问:线性回归是什么?

You can answer either:

您可以回答:

It is a linear approach that models the relationship in data between dependent and independent variables.

这是一种线性方法,可对因变量和自变量之间的数据关系进行建模。

Or:

要么:

It is a linear approach that models the relationship in data between dependent and independent variables. The model’s parameters can be derived using ordinary least squares approach and a general equation works on multi-dimensional data. It is an algorithm that is simple, fast, and interpretable. However, it has certain caveats such as …

这是一种线性方法,可对因变量和自变量之间的数据关系进行建模。 可以使用普通最小二乘法得出模型的参数,并且通用方程适用于多维数据。 它是一种简单,快速且可解释的算法。 但是,它有一些警告,例如……

Do you see what I mean? By asking a simple-looking question, the interviewer can test two things. Firstly, if you have a basic knowledge (obvious). Secondly, it tests what is the depth of your understanding and how inquisitive you are while studying a certain topic. This ability is crucial in the data scientist skillset as you will often have to work with new tools and read research papers. If you don’t analyze the subject thoroughly and fail to understand its limitations and capabilities, it’s a straight path that leads to an unsuccessful project.

你明白我的意思吗? 通过问一个简单的问题,面试官可以测试两件事。 首先,如果您具有基本知识(显而易见)。 其次,它测试您对特定主题的理解的深度和好奇心。 该功能对于数据科学家技能至关重要,因为您经常需要使用新工具并阅读研究论文。 如果您没有对主题进行全面分析,并且不了解主题的局限性和功能,那么这是导致项目失败的直接途径。

展示项目。 质量还是数量? (Showcase Projects. Quality or Quantity?)

TLDR; Quality!

TLDR; 质量!

Image for post
[Source][资源]

The painful truth is that nobody cares about the endless Jupyter notebooks that you created for your 100+ mini-projects. Don’t take me wrong: it’s still a great way to experiment with new models and data. But, most likely, it won’t impress the interviewer.

痛苦的事实是,没有人会关心您为100多个迷你项目创建的无尽Jupyter笔记本。 不要误会我的意思:这仍然是尝试新模型和数据的好方法。 但是,很可能不会给面试官留下深刻的印象。

There is much more to data science than just creating dozens of untested machine learning models in a single file. In the real-life scenario, the code needs to be tested, packaged, documented and deployed using internal servers or cloud services.

数据科学不仅仅是在单个文件中创建数十个未经测试的机器学习模型,还具有更多的功能。 在实际场景中,需要使用内部服务器或云服务来测试,打包,记录和部署代码。

My advice? Go for the quality and aim to create ~3 bigger projects that will impress the interviewers. Here are some tips that you can follow:

我的建议? 追求质量 ,目标是创建〜3个更大的项目,这些项目将使访问员印象深刻 您可以按照以下提示操作:

  • Find a real-world dataset that requires a lot of preprocessing and EDA

    查找需要大量预处理和EDA的真实数据集
  • Make your code modular: create separate classes for models, data preprocessing, and end-to-end pipelines

    使代码模块化:为模型,数据预处理和端到端管道创建单独的类
  • Use test-driven development (TDD) while developing a packaged code

    在开发打包的代码时使用测试驱动的开发(TDD)

  • Work with Git and continuous integration services such as CircleCI

    与Git和持续集成服务(例如CircleCI)一起使用

  • Expose the model’s API to the user, e.g. Flask for Python

    向用户公开模型的API,例如Flask for Python

  • Document the code using Sphinx and adhere to code styling guidelines (e.g. PEP-8 for Python)

    使用Sphinx记录代码并遵守代码样式准则(例如,用于Python的PEP-8 )

A really good course on ML model deployment was created by data scientists from Babylon Health and Train In Data at Udemy. You can find it here.

来自于Udemy的Babylon HealthTrain In Data的数据科学家创建了关于ML模型部署的非常好的课程。 你可以在这里找到它。

奖励:简历模板 (Bonus: CV Template)

I am a big fan of 1-page CVs for data science internships. It helps me to keep it simple and clear without redundant information. I used to have a Word template in the past, but I was losing a lot of time modifying it. When I was removing or adding some information, the formatting was instantly blowing off making my CV look like the Enigma code 😆

我非常喜欢用于数据科学实习的1页简历。 它可以帮助我在没有多余信息的情况下保持简单明了。 我过去曾经有一个Word模板,但是我浪费了很多时间来修改它。 当我删除或添加一些信息时,格式立即消失,使我的简历看起来像Enigma代码😆

Anyway, I found a nice looking Overleaf CV template that I’ve been using ever since. It is simple, clear, and most importantly, it’s rendered with a modular Latex code that makes formatting a painless task. The link to the CV template is here.

无论如何,我找到了自此以来一直在使用的漂亮的Overleaf CV模板。 它简单,清晰,最重要的是,它使用模块化的Latex代码进行渲染,从而使格式化工作变得轻而易举。 简历模板的链接在这里 。

关于我 (About Me)

I am an MSc Artificial Intelligence student at the University of Amsterdam. In my spare time, you can find me fiddling with data or debugging my deep learning model (I swear it worked!). I also like hiking :)

我是阿姆斯特丹大学的人工智能硕士研究生。 在业余时间,您会发现我不喜欢数据或调试我的深度学习模型(我发誓它能工作!)。 我也喜欢远足:)

Here are my social media profiles, if you want to stay in touch with my latest articles and other useful content:

如果您想与我的最新文章和其他有用内容保持联系,这是我的社交媒体个人资料:

  • Linkedin

    领英

  • Github

    Github

  • Personal Website

    个人网站

翻译自: https://towardsdatascience.com/interviewing-for-data-science-internship-how-to-prepare-f6b9c2c7fa97

数据暑假实习面试

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389666.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

谷歌 colab_如何在Google Colab上使用熊猫分析

谷歌 colabRecently, pandas have come up with an amazing open-source library called pandas-profiling. Generally, EDA starts by df.describe(), df.info() and etc which to be done separately. Pandas_profiling extends the general data frame report using a singl…

Java之生成Pdf并对Pdf内容操作

虽说网上有很多可以在线导出Pdf或者word或者转成png等格式的工具,但是我觉得还是得了解知道是怎么实现的。一来,在线免费转换工具,是有容量限制的,达到一定的容量时,是不能成功导出的;二来,业务需求&#x…

边际概率条件概率_数据科学家解释的边际联合和条件概率

边际概率条件概率Probability plays a very important role in Data Science, as Data Scientist regularly attempt to draw statistical inferences that could be used to predict data or analyse data better.P robability起着数据科学非常重要的作用,为数据科…

袋装决策树_袋装树是每个数据科学家需要的机器学习算法

袋装决策树袋装树木介绍 (Introduction to Bagged Trees) Without diving into the specifics just yet, it’s important that you have some foundation understanding of decision trees.尚未深入研究细节,对决策树有一定基础了解就很重要。 From the evaluatio…

[JS 分析] 天_眼_查 字体文件

0. 参考 js分析 猫_眼_电_影 字体文件 font-face 1. 分析 1.1 定位目标元素 1.2 查看网页源代码 1.3 requests 请求提取得到大量错误信息 对比猫_眼_电_影抓取到unicode编码,天_眼_查混合使用正常字体和自定义字体,难点在于如何从 红 转化为 美。 一开始…

经天测绘测量工具包_公共土地测量系统

经天测绘测量工具包部分-乡镇第一师 (Sections — First Divisions of Townships) The PLSS Townships are typically divided into 36 Sections (nominally one mile on a side), but in the national standard this feature is called the first division because Townships …

洛谷 P4012 深海机器人问题【费用流】

题目链接:https://www.luogu.org/problemnew/show/P4012 洛谷 P4012 深海机器人问题 输入输出样例 输入样例#1: 1 1 2 2 1 2 3 4 5 6 7 2 8 10 9 3 2 0 0 2 2 2 输出样例#1: 42 说明 题解:建图方法如下: 对于矩阵中的每…

opencv实现对象跟踪_如何使用opencv跟踪对象的距离和角度

opencv实现对象跟踪介绍 (Introduction) Tracking the distance and angle of an object has many practical uses, especially in robotics. This tutorial explains how to get an accurate distance and angle measurement, even when the target is at a strong angle from…

spring cloud 入门系列七:基于Git存储的分布式配置中心--Spring Cloud Config

我们前面接触到的spring cloud组件都是基于Netflix的组件进行实现的,这次我们来看下spring cloud 团队自己创建的一个全新项目:Spring Cloud Config.它用来为分布式系统中的基础设施和微服务提供集中化的外部配置支持,分为服务端和客户端两个…

熊猫数据集_大熊猫数据框的5个基本操作

熊猫数据集Tips and Tricks for Data Science数据科学技巧与窍门 Pandas is a powerful and easy-to-use software library written in the Python programming language, and is used for data manipulation and analysis.Pandas是使用Python编程语言编写的功能强大且易于使用…

图嵌入综述 (arxiv 1709.07604) 译文五、六、七

应用 图嵌入有益于各种图分析应用,因为向量表示可以在时间和空间上高效处理。 在本节中,我们将图嵌入的应用分类为节点相关,边相关和图相关。 节点相关应用 节点分类 节点分类是基于从标记节点习得的规则,为图中的每个节点分配类标…

聊聊自动化测试框架

无论是在自动化测试实践,还是日常交流中,经常听到一个词:框架。之前学习自动化测试的过程中,一直对“框架”这个词知其然不知其所以然。 最近看了很多自动化相关的资料,加上自己的一些实践,算是对“框架”有…

移动磁盘文件或目录损坏且无法读取资料如何找回

文件或目录损坏且无法读取说明这个盘的文件系统结构损坏了。在平时如果数据不重要,那么可以直接格式化就能用了。但是有的时候里面的数据很重要,那么就必须先恢复出数据再格式化。具体恢复方法可以看正文了解(不格式化的恢复方法)…

python 平滑时间序列_时间序列平滑以实现更好的聚类

python 平滑时间序列In time series analysis, the presence of dirty and messy data can alter our reasonings and conclusions. This is true, especially in this domain, because the temporal dependency plays a crucial role when dealing with temporal sequences.在…

帮助学生改善学习方法_学生应该如何花费时间改善自己的幸福

帮助学生改善学习方法There have been numerous studies looking into the relationship between sleep, exercise, leisure, studying and happiness. The results were often quite like how we expected, though there have been debates about the relationship between sl…

Spring Boot 静态资源访问原理解析

一、前言 springboot配置静态资源方式是多种多样,接下来我会介绍其中几种方式,并解析一下其中的原理。 二、使用properties属性进行配置 应该说 spring.mvc.static-path-pattern 和 spring.resources.static-locations这两属性是成对使用的,如…

深挖“窄带高清”的实现原理

过去几年,又拍云一直在点播、直播等视频应用方面潜心钻研,取得了不俗的成果。我们结合点播、直播、短视频等业务中的用户场景,推出了“省带宽、压成本”系列文章,从编码技术、网络架构等角度出发,结合又拍云的产品成果…

Redis 服务安装

下载 客户端可视化工具: RedisDesktopManager redis官网下载: http://redis.io/download windos服务安装 windows服务安装/卸载下载文件并解压使用 管理员身份 运行命令行并且切换到解压目录执行 redis-service --service-install windowsR 打开运行窗口, 输入 services.msc 查…

熊猫数据集_对熊猫数据框使用逻辑比较

熊猫数据集P (tPYTHON) Logical comparisons are used everywhere.逻辑比较随处可见 。 The Pandas library gives you a lot of different ways that you can compare a DataFrame or Series to other Pandas objects, lists, scalar values, and more. The traditional comp…

决策树之前要不要处理缺失值_不要使用这样的决策树

决策树之前要不要处理缺失值As one of the most popular classic machine learning algorithm, the Decision Tree is much more intuitive than the others for its explainability. In one of my previous article, I have introduced the basic idea and mechanism of a Dec…