微软大数据_我对Microsoft的数据科学采访

微软大数据

Microsoft was one of the software companies that come to hire interns at my university for 2021 summers. This year, it was the first time that Microsoft offered any Data Science Internship for pre-final year undergraduate students.

微软是到2021年夏天来我大学招聘实习生的软件公司之一。 今年,这是微软首次为预科本科生提供任何数据科学实习。

Microsoft set the requirements as follows:-

Microsoft将要求设置如下:

  1. The student must have a minimum CGPA of 8.

    学生的最低CGPA必须为8。
  2. The student should be pursuing a Computer Science or Mathematics major.

    该学生应攻读计算机科学或数学专业。

All the eligible students had to fill the Internship application form on the Microsoft Career website with a resume. Students who had filled the application form received the test link within 1–2 days.

所有符合条件的学生都必须用简历填写Microsoft Career网站上的实习申请表。 填写申请表的学生将在1-2天内收到测试链接。

在线测试: (Online Test:)

About 60–70 students give the test for the internship, conducted on the mettl platform. The duration of the test was 1 hour. The test consists of 62 Multiple Choice Questions, which touches almost every aspect of machine learning. There was no information about the marking scheme for the test.

在mettl平台上进行的实习测试大约有60-70名学生。 测试时间为1小时。 该测验包含62个多项选择题,几乎涵盖了机器学习的各个方面。 没有有关测试标记方案的信息。

The key points takeaways from the online test were:

在线测试的要点是:

  1. Questions ranged from various topics such as Linear Regression, Logistic Regression, SVM, Decision Trees, Random forests, Underfitting Overfitting, Bias, Variance, Bagging, Boosting, Clustering, Recommender Systems, PCA, LDA, and Neural Networks. There were some basic questions from Probability and Statistics.

    问题涉及多个主题,例如线性回归,逻辑回归,SVM,决策树,随机森林,拟合不足的过拟合,偏差,方差,装袋,增强,聚类,推荐系统,PCA,LDA和神经网络。 概率论和统计学有一些基本问题。
  2. Most of the questions were conceptual, such as about the kernel function in the SVM or the central limit theorem.

    大多数问题都是概念性的,例如关于SVM中的内核功能或中央极限定理。
  3. There were fewer questions on Neural Networks, so the students were expected to be well-versed with traditional Machine Learning algorithms.

    神经网络上的问题较少,因此希望学生们精通传统的机器学习算法。
  4. There were no coding questions or questions like what is the correct code for this algorithm using sklearn etc.

    没有编码问题或诸如使用sklearn等对该算法的正确代码是什么的问题。

I was able to complete about 50 out of 62 questions in the 1 hour time.

我在1小时的时间内完成了62个问题中的50个。

Since I didn’t know much about Recommender Systems and LDA algorithms, so I wasn’t able to answer those questions in addition to questions on Convex optimization(about 2–3 in number).

由于我对Recommender系统和LDA算法了解不多,所以除了关于凸优化的问题(数量约为2-3)之外,我无法回答这些问题。

Microsoft didn’t release the exact results for the test but released a list of 6 students shortlisted for the interviews, including me!

微软没有公布测试的确切结果,但公布了入围面试的6名学生的名单,其中包括我!

I had about a day to prepare for the interview and had no idea about a Data Science Interview. I took some help from seniors and revised the concepts asked during the online test(mostly traditional machine learning algorithms) from Stanford CS229 notes. In addition to this, I also reviewed everything about the projects on my resume.

我有大约一天的时间为面试做准备,但对数据科学面试一无所知。 我从前辈那里获得了一些帮助,并修改了斯坦福CS229笔记在在线测试(大多数是传统的机器学习算法)中提出的概念。 除此之外,我还在简历中回顾了有关项目的所有内容。

Interviews were taken online on the Microsoft Teams platform due to COVID-19, and there was a total of 3 rounds of technical interviews for each candidate.

由于COVID-19,面试是在Microsoft Teams平台上进行的,每位候选人总共进行了3轮技术面试。

第1轮: (Round 1:)

At first, the interviewer asked me to introduce myself and speak about my interests in which I talked about my interests in computer vision.

最初,面试官让我自我介绍并谈论自己的兴趣,其中我谈到了我对计算机视觉的兴趣。

I was asked the following questions:-

我被问到以下问题:

  1. Explain the working of a convolutional layer and design a CNN for Image Classification? Explain the loss function, regularization, and activation function used for it?

    解释卷积层的工作并设计用于图像分类的CNN? 请解释用于它的损失函数,正则化和激活函数吗?
  2. Explain the Decision Tree algorithm? Also, explain the bagging and boosting algorithm with Decision Trees? Explain the weighting function used in the boosting algorithm?

    解释决策树算法? 另外,用决策树解释装袋和提升算法吗? 解释提升算法中使用的加权函数?
  3. Design a spam classification system? Also, explain the feature extraction, algorithm, and metrics used for evaluation.

    设计垃圾邮件分类系统? 另外,说明用于评估的特征提取,算法和度量。
  4. Explain the in-depth working of Support Vector Machines(SVMs)? Also, explain the convex optimizations, kernel functions, and what is support vectors.

    解释支持向量机(SVM)的深入工作? 另外,请解释凸优化,核函数以及什么是支持向量。

I was able to answer all the questions except for the working of SVMs, in which I was able to explain up to margins and kernel functions but as not able to explain the convex optimization part. I explained the answers by illustrating the algorithms on a shared screen.

除了支持SVM的工作之外,我能够回答所有问题,在SVM中,我最多可以解释边距和内核函数,但不能解释凸优化部分。 我通过在共享屏幕上显示算法来解释答案。

He then asked me if I have any questions. I then asked about some data science use cases in Microsoft. And the interview was over. The entire interview took about 45 minutes.

然后他问我是否有任何问题。 然后,我询问了Microsoft中的一些数据科学用例。 采访结束了。 整个采访耗时约45分钟。

Three students made it to the second round, which took place after a couple of hours.

两个小时后,三名学生进入了第二轮比赛。

I revised SVM during the time between the 1st and 2nd rounds.

我在第一轮和第二轮之间修改了SVM。

第二回合 (Round 2:)

This round was similar to round 1, but the interviewer asked a significant number of NLP(Natural Language Processing) questions.

该回合与第一回合相似,但面试官问了很多NLP(自然语言处理)问题。

The round starts similarly with introducing myself and my interests.

此轮以介绍自己和我的兴趣类似地开始。

I was asked the following questions:-

我被问到以下问题:

  1. What is the difference between bias and variance?

    偏差和方差有什么区别?
  2. Explain multiclass classification using Logistic Regression? Also, explain the softmax activation, cross-entropy loss, and write the equations for the same?

    使用Logistic回归解释多类分类? 另外,解释softmax激活,交叉熵损失,并写出相同的方程式吗?
  3. Explain the working of RNNs, GRUs, and LSTMs? Also, explain the pros and cons of each type of network? Also, explain why transformer-based models are better than these?

    解释RNN,GRU和LSTM的工作? 另外,请解释每种网络的利弊? 另外,请解释为什么基于变压器的模型比这些模型更好?
  4. Explain the training procedure to obtain Glove embedding?

    请解释训练程序以获得手套嵌入?
  5. Design a spam classification system? Also, explain the feature extraction, algorithm, and metrics used for evaluation?

    设计垃圾邮件分类系统? 另外,请解释用于评估的特征提取,算法和指标?
  6. Explain the in-depth working of Support Vector Machines(SVMs)? Also, explain the kernel functions? And how SVM classifies when there is no linear separation between different classes?

    解释支持向量机(SVM)的深入工作? 另外,解释内核功能吗? 当不同类别之间没有线性分隔时,SVM如何分类?
  7. Which algorithm should be used to extract Nouns from search engine queries? And explain why?

    应该使用哪种算法从搜索引擎查询中提取名词? 并解释为什么?
  8. Derive the equations for the forward and backward pass in a Linear Regression?

    推导线性回归中向前和向后通过的方程式?

I was able to answer most of the questions in the interview, except the mathematical equations involved in SVMs. The interviewer seemed satisfied with most of my answers. I explained the answers by illustrating the algorithms on a shared screen.

除了SVM中涉及的数学方程式,我能够回答采访中的大多数问题。 面试官似乎对我的大部分回答感到满意。 我通过在共享屏幕上显示算法来解释答案。

She then asked me if I have any questions. I then asked the same question as round 1. The entire interview took about 45 minutes.

然后她问我是否有任何问题。 然后,我问了与第1轮相同的问题。整个采访耗时约45分钟。

Round 3:

第三回合

The interviewer didn’t have a Data Science background, so he asked me questions on Data Structures & Algorithms. But he mentioned that it wouldn’t be hard since the interview was for a data science role.

面试官没有数据科学背景,所以他问我有关数据结构和算法的问题。 但他提到,由于面试是针对数据科学职位,所以这并不难。

The interview starts with the formal introduction, and he asked me to introduce myself as usual.

采访从正式介绍开始,他让我像往常一样自我介绍。

I was asked the following questions:-

我被问到以下问题:

  1. Given an array A=[a1,a2,a3…an,b1,b2,b3…bn], convert the array into the array B=[a1,b1,a2,b2…..an,bn] using only O(1) space.

    给定数组A = [a1,a2,a3 ... an,b1,b2,b3 ... bn],仅使用O()将数组转换为数组B = [a1,b1,a2,b2 ..... an,bn] 1)空间。
  2. In the previous question, given an index, in the array A, return the index it would have in array B.

    在上一个问题中,给定索引,在数组A中返回数组B中应具有的索引。
  3. You have an array of ‘2N’ elements consisting of ’N’ even, and ’N’ odd elements, using the minimum number of swaps make sure that even elements are at odd indexes and odd elements are at even indexes.

    您有一个由'N'个偶数和'N'个奇数元素组成的'2N'个元素数组,使用最小数量的交换来确保偶数元素在奇数索引处,奇数元素在偶数索引处。
  4. In the previous question, assume that the information about the number of even is equal to the number of odd elements is not given, so verify the same while using the minimum number of swaps and only in one iteration on the array.

    在上一个问题中,假设没有提供有关偶数等于奇数元素的信息,因此在使用最小交换次数并且仅在数组上进行一次迭代时,请验证相同的信息。

I was not able to answer the first question correctly, so the interviewer modified it to 2nd question, which I answered correctly and coded in a shared screen. He seemed satisfied by the answer to the 2nd question.

我无法正确回答第一个问题,因此面试官将其修改为第二个问题,我回答正确并在共享屏幕中进行了编码。 他似乎对第二个问题的回答感到满意。

He then asked me the 3rd question, which I answered using the 2-pointer technique, and I coded the solution after explaining to him. He seemed satisfied with the answer.

然后,他问了我第三个问题,我使用2指针技术回答了这个问题,在向他解释后我对解决方案进行了编码。 他似乎对答案感到满意。

The interviewer then modified the question to 4th question, for which I changed the loop and added some if-else statements in the loop, after which the interview discussed some edge cases in which the solution will fail, I then modified the code to accommodate edge cases. The interviewer seemed satisfied with the answer.

然后,采访者将问题修改为第四个问题,为此我更改了循环,并在循环中添加了一些if-else语句,此后,采访者讨论了一些解决方案将失败的边缘情况,然后我修改了代码以适应边缘案件。 面试官似乎对答案感到满意。

He then asked if I have any questions, then I asked him about the work culture at Microsoft and the work he does at the company. After this, the interview was over. The whole interview took 45 minutes.

然后他问我是否有任何问题,然后我问他有关Microsoft的工作文化以及他在公司所做的工作。 此后,采访结束了。 整个采访耗时45分钟。

Key takeaways:

关键要点:

  1. It is crucial to understand the mathematical concepts behind the algorithm rather than treating it as black-box algorithms.

    了解算法背后的数学概念而不是将其视为黑盒算法至关重要。
  2. Having machine learning projects on your resume is a huge plus point since every other candidate had to explain their projects. Review your projects thoroughly.

    在简历上拥有机器学习项目是一个巨大的优势,因为其他所有候选人都必须解释他们的项目。 彻底检查您的项目。
  3. Have some decent practice of DSA questions. There might be some DSA rounds involved in the process. I was the only one to go through a DSA round among six candidates.

    有一些体面的DSA问题练习。 此过程可能涉及一些DSA回合。 在六名候选人中,我是唯一一个参加DSA回合的人。
  4. Read about some use-cases of machine learning in Industry, since most of data science interviews have these type of questions.

    阅读有关工业中机器学习的一些用例,因为大多数数据科学访谈都涉及这类问题。

结论: (Conclusions:)

I was very confident about my performance in the first two rounds but was a little unsure of my performance in the 3rd round since I was pretty weak in Data Structures and Algorithms.

我对前两轮的表现非常有信心,但是由于我在数据结构和算法方面的能力很弱,因此对第三轮的表现有些不确定。

After three days, Microsoft declared the results for the internship position, and three students received the offer, and I was one of them!

三天后,微软宣布了实习职位的结果,三名学生收到了录取通知书, 我就是其中之一!

I will now intern at one of the offices at Microsoft India during May 2021-July 2021.

我现在将在2021年5月2021年7月间在Microsoft印度的一个办事处实习。

翻译自: https://towardsdatascience.com/my-data-science-interview-with-microsoft-6b7ec840b80e

微软大数据

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/392187.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

再次检查打印机名称 并确保_我们的公司名称糟透了。 这是确保您没有的方法。...

再次检查打印机名称 并确保by Dawid Cedrych通过戴维德塞德里奇 我们的公司名称糟透了。 这是确保您没有的方法。 (Our company name sucked. Here’s how to make sure yours doesn’t.) It is harder than one might think to find a good business name. Paul Graham of Y …

linux中文本查找命令,Linux常用的文本查找命令 find

一、常用的文本查找命令grep、egrep命令grep:文本搜索工具,根据用户指定的文本模式对目标文件进行逐行搜索,先是能够被模式匹配到的行。后面跟正则表达式,让grep工具相当强大。-E之后还支持扩展的正则表达式。# grep [options] …

分布与并行计算—日志挖掘(Java)

日志挖掘——处理数据、计费统计 1、读取附件中日志的内容,找出自己学号停车场中对应的进出车次数(in/out配对的记录数,1条in、1条out,视为一个车次,本日志中in/out为一一对应,不存在缺失某条进或出记录&a…

《人人都该买保险》读书笔记

内容目录: 1.你必须知道的保险知识 2.家庭理财的必需品 3.保障型保险产品 4.储蓄型保险产品 5.投资型保险产品 6.明明白白买保险 现在我所在的公司Manulife是一家金融保险公司,主打业务就是保险,因此我需要熟悉一下保险的基础知识&#xff0c…

Linux下查看txt文档

当我们在使用Window操作系统的时候,可能使用最多的文本格式就是txt了,可是当我们将Window平台下的txt文本文档复制到Linux平台下查看时,发现原来的中文所有变成了乱码。没错, 引起这个结果的原因就是两个平台下,编辑器…

如何击败腾讯_击败股市

如何击败腾讯个人项目 (Personal Proyects) Note from Towards Data Science’s editors: While we allow independent authors to publish articles in accordance with our rules and guidelines, we do not endorse each author’s contribution. You should not rely on an…

滑块 组件_组件制作:如何使用链接的输入创建滑块

滑块 组件by Robin Sandborg罗宾桑德伯格(Robin Sandborg) 组件制作:如何使用链接的输入创建滑块 (Component crafting: how to create a slider with a linked input) Here at Stacc, we’re huge fans of React and the render-props pattern. When it came time…

配置静态IPV6 NAT-PT

一.概述: IPV6 NAT-PT( Network Address Translation - Port Translation)应用与ipv4和ipv6网络互访的情况,根据参考链接配置时出现一些问题,所以记录下来。参考链接:http://www.cisco.com/en/US/tech/tk648/tk361/technologies_c…

linux 线程与进程 pid,linux下线程所属进程号问题

这一段看《unix环境高级编程》,一个关于线程的小例子。#include#include#includepthread_t ntid;void printids(const char *s){pid_t pid;pthread_t tid;pidgetpid();tidpthread_self();printf("%s pid %u tid %u (0x%x)n",s,(unsigned int)pid,(unsigne…

python3虚拟环境中解决 ModuleNotFoundError: No module named '_ssl'

前提是已经安装了openssl 问题 当我在python3虚拟环境中导入ssl模块时报错,报错如下: (py3) [rootlocalhost Python-3.6.3]# python3 Python 3.6.3 (default, Nov 19 2018, 14:18:18) [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux Type "help…

python 使用c模块_您可能没有使用(但应该使用)的很棒的Python模块

python 使用c模块by Adam Goldschmidt亚当戈德施密特(Adam Goldschmidt) 您可能没有使用(但应该使用)的很棒的Python模块 (Awesome Python modules you probably aren’t using (but should be)) Python is a beautiful language, and it contains many built-in modules that…

分布与并行计算—生产者消费者模型实现(Java)

在实际的软件开发过程中,经常会碰到如下场景:某个模块负责产生数据,这些数据由另一个模块来负责处理(此处的模块是广义的,可以是类、函数、线程、进程等)。产生数据的模块,就形象地称为生产者&a…

通过Xshell登录远程服务器实时查看log日志

主要想总结以下几点: 1.如何使用生成密钥的方式来登录Xshell连接远端服务器 2.在远程服务器上如何上传和下载文件(下载log文件到本地) 3.如何实时查看log,提取错误信息 一. 使用生成密钥的方式来登录Xshell连接远端服务器 ssh登录…

如何将Jupyter Notebook连接到远程Spark集群并每天运行Spark作业?

As a data scientist, you are developing notebooks that process large data that does not fit in your laptop using Spark. What would you do? This is not a trivial problem.作为数据科学家,您正在开发使用Spark处理笔记本电脑无法容纳的大数据的笔记本电脑…

是银弹吗?业务基线方法论

Fred.Brooks在1987年就提出:没有银弹。没有任何一项技术或方法可以能让软件工程的生产力在十年内提高十倍。 我无意挑战这个理论,只想讨论一个方案,一个可能大幅提高业务系统开发效率的方案。 方案描述 我管这个方案叫做“由基线扩展…

linux core无权限,linux – 为什么编辑core_pattern受限制?

当我试图为故意崩溃的程序生成核心文件时,最初的核心文件生成似乎被abrt-ccpp阻碍了.所以我尝试用vim手动编辑/ proc / sys / kernel / core_pattern:> sudo vim /proc/sys/kernel/core_pattern当我试图保存文件时,vim报告了这个错误:"/proc/sys…

nsa构架_我如何使用NSA的Ghidra解决了一个简单的CrackMe挑战

nsa构架by Denis Nuțiu丹尼斯努尤(Denis Nuțiu) 我如何使用NSA的Ghidra解决了一个简单的CrackMe挑战 (How I solved a simple CrackMe challenge with the NSA’s Ghidra) Hello!你好! I’ve been playing recently a bit with Ghidra, which is a reverse engi…

分布与并行计算—生产者消费者模型队列(Java)

在生产者-消费者模型中&#xff0c;在原有代码基础上&#xff0c;把队列独立为1个类实现&#xff0c;通过公布接口&#xff0c;由生产者和消费者调用。 public class Consumer implements Runnable {int n;CountDownLatch countDownLatch;public Consumer(BlockingQueue<In…

python 日志内容提取

问题&#xff1a;如下&#xff0c;一个很大的日志文件&#xff0c;提取 start: 到 end: 标志中间的内容 日志文件a.log xxxxx yyyyy start: start: hahahaha end: start: hahahahha end: ccccccc kkkkkkk cdcdcdcd start: hahahaha end: code import reisfindFalse with open(&…

同一服务器部署多个tomcat时的端口号修改详情

2019独角兽企业重金招聘Python工程师标准>>> 同一服务器部署多个tomcat时&#xff0c;存在端口号冲突的问题&#xff0c;所以需要修改tomcat配置文件server.xml&#xff0c;以tomcat7为例。 首先了解下tomcat的几个主要端口&#xff1a;<Connector port"808…