orange 数据分析_使用Orange GUI的放置结果数据分析

orange 数据分析

Objective : Analysing of several factors influencing the recruitment of students and extracting information through plots.

目的:分析影响学生招生和通过情节提取信息的几个因素。

Description : The following analysis presents the different plots that attempts to link students’ placement prospects, made possible through student perceptions of recruiting organisations to certain academic parameters such as percentage obtained in secondary and higher secondary school, undergraduate degree and post graduation degree.

Description(说明) :以下分析提出了不同的图,这些图试图通过将学生对招募组织的理解与某些学术参数(例如,在中学和高中获得的百分比,大学学位和毕业学位)的理解联系起来,从而尝试联系学生的就业前景。

Miscellaneous factors such as the gender of the candidate, the choice of board for and the stream opted for in high school and secondary education, undergraduate degree specialisation and post graduate degree specialisation have also been taken into account to predict placement status as well as salary offered.

还考虑了其​​他因素,例如候选人的性别,高中和中等教育的董事会选择和选择的职位,本科学位专业和研究生学位专业,以预测安置状况以及所提供的薪水。

Several colleges offer employ-ability tests which serve as a way of helping the employers evaluate their workforce, analyse and judge their skills and hence recruit the right talent. Thus, performance of students in such tests conducted by the college and their previous work experience have also been analysed to deduce their relation with recruitment opportunities.

几所大学提供就业能力测试,以帮助雇主评估其劳动力,分析和判断其技能,从而招募合适的人才。 因此,还对学生在大学进行的此类测试中的表现以及他们以前的工作经验进行了分析,以推断出他们与招聘机会的关系。

Hypothesis : Students with better scores in secondary education and undergraduate degree have better prospects of getting placed.

假设 :中学教育和大学学位较高的学生有更好的入学前景。

Understanding the Project :

了解项目

Going through the analysis, a reader shall be able to infer :

通过分析,读者应能够推断:

  1. How the choice of board of education influences placement prospects.

    教育委员会的选择如何影响安置前景。
  2. The relative importance of scores obtained in various degrees and streams in campus recruitment procedure.

    在校园招聘过程中,不同程度和不同等级获得的分数的相对重要性。
  3. The relation between gender and work experience with salary offered by corporate on campus placements.

    性别和工作经验与公司在校园安置中提供的薪水之间的关系。

Acknowledgements:

致谢:

Myself Ruchika Parag Barman and my team mate Prafful Chauhan created this notebook/blog as part of the course work under “Pandas, bamboolib & Orange workshop” at Suven, under mentor-ship of Rocky Jagtiani .

我自己的Ruchika Parag Barman和我的队友Prafful ChauhanRocky Jagtiani的指导下,在Suven的 “熊猫,竹筒和橙子工作坊”下创建了此笔记本/博客,作为该课程工作的一部分。

Learned from https://datascience.suvenconsultants.com.

从https://datascience.suvenconsultants.com了解到。

Mentored by Rocky Jagtiani.

Rocky Jagtiani指导

Dataset:

资料集:

This data set consists of Placement data of students in a XYZ campus. It includes secondary and higher secondary school percentage and specialization. It also includes degree specialization, type and Work experience and salary offers to the placed students.

此数据集包含XYZ校园中学生的安置数据。 它包括中学和高中的百分比和专业。 它还包括学位专业化,类型和工作经验以及向所安置学生提供的薪水。

Image for post

We have taken 60 observations (no of rows) from which we are extract information through exploratory data analysis and visualization. There are 8 categorical features and 6 numerical features.

我们采用了60个观测值(无行),通过探索性数据分析和可视化从中提取信息。 有8个分类特征和6个数字特征。

Histograms :

直方图:

Image for post

Inference : Male students are getting more placements than female students and the ratio of male to female in placements is almost around 2:1.

推论 学生变得比女生更多的展示位置男性的比例, 女性 配股几乎是2:1左右。

Image for post

Inference : We can inspect that with respect to high school education, Central board students have wider range of salary than the other board students but placement ratio central to others is less than 1.

推论 :我们可以检查到,就高中教育而言, 中央董事会学生的薪资范围比其他董事会学生要大,但相对于其他人而言, 中心 职位的就业率低于1。

Image for post

Inference : We can inspect that with respect to secondary education, Central board students have wider range of salary than the other board students.

推论 :我们可以检查到,就中等教育而言, 中央董事会学生的薪资范围比其他董事会学生要广。

Image for post

Inference : Commerce and Arts students have wider range of salary and number of placed students are more as compared to science or other stream.

推论 :与理科其他 专业相比, 商科文科生的薪资范围更广,安置学生的数量也更多。

From the above graphs, one can gather that gender plays quite an important role in whether or not a candidate will be hired. It is more likely for a male candidate to get placed at a corporate as compared to a female candidate. Similarly, the board of education and the stream chosen also determine salary offered. Students have been proposed higher amounts of pay that opted for Commerce and Management studies.

从以上图表可以看出,性别在是否应聘者中起着非常重要的作用。 与女性候选人相比,男性候选人更有可能被安置在公司。 同样,教育委员会和所选择的职位也决定了提供的薪水。 建议学生选择更高的薪水,选择商务和管理学习。

Correlations :

相关性

Image for post

The correlations table gives us the following ideas :

相关表为我们提供了以下想法:

  1. Students who have scored well in their secondary education are very likely to perform well in their undergraduate degree also.

    中学教育中取得良好成绩的学生,其本科学位也很可能会表现良好。

  2. Students who have scored well in their high school education eventually perform well in their secondary education also.

    高中阶段成绩良好的学生最终在中等教育方面也表现良好。

  3. Again, students who have scored well in their high school education are very likely to perform well in their undergraduate degree also.

    同样,在高中阶段取得良好成绩的学生也很可能在本科学位上表现良好。

  4. Most students who have had a good academic record in their high school education also score high in their MBA degree.

    大多数高中学历良好的学生的MBA学位也很高。

Boxplots :

箱线图

Image for post

Inference : The above boxplot shows the relation between percentage obtained in the undergraduate degree and placement status. Students who get placed score higher than those who do not get placed. The mean score of placed students is given by 68.6925, standard deviation is 6.189 ,2nd quartile or median is 69.25 ,1st quartile is 64.50 and 3rd quartile is 72.1150.

推论 :上面的方框图显示了本科学位所占百分比升学状况的关系 。 被安置的学生的得分高于没有被安置的学生。 留学生的平均分数为68.6925,标准差为6.189,第二四分位数或中位数为69.25,第一四分位数为64.50,第三四分位数为72.1150。

Whereas, the mean percentage of students not placed is given by 60.8670, standard deviation is 7.045, 2nd quartile or median is 61.00, 1st quartile is 56.65 and 3rd quartile is 64.00.

而未安置学生的平均百分比为60.8670,标准差为7.045,第二四分位数或中位数是61.00,第一四分位数是56.65,第三四分位数是64.00。

From this analysis, undergraduate students/freshers can prioritise and prepare for their undergraduate/degree examinations keeping in mind the average score, as mentioned above, that the corporate companies generally perceive worthy of grabbing a placement in their establishment.

通过这种分析,本科生/新生可以优先考虑并为本科生/学位考试做准备,同时牢记如上所述的平均分数,即公司通常认为值得在其机构中获得职位。

Image for post

Inference : Male candidates get a higher pay than female candidates. The mean salary of placed male students is given by 302608.70 , standard deviation is 144726.4 , 2nd quartile or median is 264000, 1st quartile is 240000 and 3rd quartile is 300000.

推论男性候选人的薪酬高于女性候选人 。 入学男生的平均工资为302608.70,标准差为144726.4,第二四分位数或中位数为264000,第一四分位数为240000,第三四分位数为300000。

On the other hand, the mean salary of placed female students is given by 267571.43, standard deviation is 41776.1, 2nd quartile or median is 250000 ,1st quartile is 240000 and 3rd quartile is 300000.

另一方面,入职女学生的平均工资为267571.43,标准差为41776.1,第二四分位数或中位数为250000,第一四分位数为240000,第三四分位数为300000。

Thus, we can see that while the placement rate of females is lower than males, the salary offered to the placed female candidates is also relatively lower than that of the male candidates.

因此,我们可以看到,尽管女性的就业率低于男性,但提供给被安置的女性候选人的薪水也相对低于男性候选人。

Pivot Table :

数据透视表

Image for post

Inference : As more students opt for Commerce and Management, the no. of placed students as well as students not placed are much higher in it as compared to Science and other streams. Even the ratio of placed to students not placed is higher in Commerce and Management is higher than that in Science.

推论 :随着越来越多的学生选择商业与管理 ,不 与理科和其他科目相比, 录取学生和未录取学生的比例要高得多。 即使在商务和管理领域,就读率和未就读率之间的比重也更高,而在理科中则更高。

Readers can understand there are relatively more job opportunities for students who opt for Commerce and Management than other streams.

读者可以理解,选择商业和管理专业的学生比其他领域的工作机会相对更多。

Scatterplots :

散点图

Image for post

For scatterplots, we have used 60% of the data provided. A scatterplot with variables salary and percentage obtained in the degree examination is formed. Here,the different points have been coloured according to the different streams as shown in the legends table.

对于散点图,我们使用了提供的60%的数据。 形成了在学位考试中获得的 薪水百分比可变的散点图。 在这里,不同的点已根据图例表中所示的不同流进行了着色。

Inference : The higher salaries have been offered to students whose scores lie in the range 64–74. Moreover, from the point of stream, most of the students that have been offered a pay higher than 300,000 belong to Commerce and Management. Very few students of Science and even fewer students of other streams have crossed the threshold of 300,000 pay.

推论 :为分数在64-74之间的学生提供了更高的薪水 。 而且,从的角度来看,获得超过30万薪水的大多数学生属于商业与管理专业。 理科专业的学生很少,其他流派的学生甚至超过了30万。

Image for post

Inference : Students that specialise in Marketing and Finance and those in Marketing and HR score similarly in MBA percentage. However, the highest paid students generally have scores in the range 62–70, approximately. Very few students have been offered a pay higher than 400,000. Majority of students are offered salaries in the range of 250,000 to 350,000.

推论市场营销与金融专业的学生, 市场 营销与人力资源 专业的MBA百分比得分相似。 但是,收入最高的学生的分数通常在62-70之间。 很少有学生获得高于40万的薪水。 大多数学生的薪水在250,000到350,000之间。

We can understand that maintaining an average score that falls in the above mentioned range shall suffice for a decent paying placement.

我们可以理解,将平均得分保持在上述范围内就足以获得不错的付费。

Mosaic Plot :

马赛克图

Image for post

Other than academic parameters, some other factors may also be considered for placement by recruiting companies. Employablity tests conducted by colleges are key for establishing appropriate labour market linkages and ascertaining that the workforce is industry ready.

除了学术参数,其他一些因素也可以由招聘单位考虑的位置 。 高校进行的能力测试对于建立适当的劳动力市场联系并确定劳动力已做好行业准备至关重要。

Inference: From the plot above, we can see that of all the students that did not get placed, very few scored above 83.5. Most of the unemployed candidates scored below 83.5.

推论 :从上图可以看出,在所有未获得排名的学生中,只有极少数得分高于83.5。 大多数失业候选人的得分都低于83.5。

Moreover, the plot suggests that students having prior work experience are considered more deserving than freshers. Nearly all the sections of students not placed did not have a prior work experience, whereas those having work experience are on the placed students section on the right.

此外,该图表明,具有过往工作经验的学生被认为比新生更值得。 几乎所有未安置学生的部分都没有事先的工作经验,而那些有工作经验的学生则在右侧的已安置学生部分。

From this, students can comprehend that having an experience in a work environment before campus recruitment proves to be beneficial. Thus, they can plan and prepare accordingly for their future.

由此,学生可以理解,在校园招聘之前的工作环境中的经验被证明是有益的。 因此,他们可以为自己的未来做计划并作相应的准备。

Classification Tree :

分类树

Image for post

This classification tree has placement status (placed) as target .It has the following parameters:

该分类树以放置状态(已放置)为目标,具有以下参数:

It is an induced binary tree.

它是一个诱导二叉树。

Minimum no. of instances in leaves : 2.

最低编号 叶子中的实例数量:2。

Do not split subsets more than :5.

子集分割不要超过:5。

Limit the maximal tree depth to : 100.

将最大树深度限制为:100。

Classification stops when majority reaches 95%.

当多数达到95%时,分类将停止。

Students can acquire a detailed analysis about the dependence of the various academic and other factors on whether or not a candidate gets placed based on the data provided. This tree gives a clear explanation of how the different attributes of a particular student shall influence their placement status.

学生可以根据所提供的数据,详细了解各种学术因素和其他因素对候选人是否被安置的依赖性。 该树清楚地解释了特定学生的不同属性如何影响他们的位置状况

Image for post

This classification tree has salary offered as target .It has the following parameters:

此分类树以薪金为目标,它具有以下参数:

It is an induced binary tree.

它是一个诱导二叉树。

Minimum no. of instances in leaves : 2.

最低编号 叶子中的实例数量:2。

Do not split subsets more than :5.

子集分割不要超过:5。

Limit the maximal tree depth to : 100.

将最大树深度限制为:100。

Classification stops when majority reaches 95%.

当多数达到95%时,分类将停止。

Students can acquire a detailed analysis about the dependence of the various academic and other factors on the salary offered to a candidate. This tree gives a clear explanation of how the different attributes of a particular student shall influence their pay.

学生可以获得有关各种学术和其他因素对应聘者薪水的依赖性的详细分析。 这棵树清楚地说明了特定学生的不同属性将如何影响他们的工资。

Vote of Thanks :

感谢票:

I would like to humbly and sincerely thank my mentor Rocky Jagtiani. He is more of a friend to me than mentor .The data analytics taught by him and various assignments we did and are still doing is the best way to learn and skill in Data Science field.

我要衷心地感谢我的导师 洛基 对于我而言,他不是导师,而是导师。他教给我们的数据分析以及我们目前做的和仍在做的各种作业是在数据科学领域学习和技能的最佳方法。

Recommended https://datascience.suvenconsultants.com/

推荐的 https://datascience.suvenconsultants.com/

翻译自: https://medium.com/@ruchikaparag18/placement-outcomes-data-analysis-using-orange-gui-1884aa3ac0c2

orange 数据分析

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389887.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

普里姆从不同顶点出发_来自三个不同聚类分析的三个不同教训数据科学的顶点...

普里姆从不同顶点出发绘制大流行时期社区的风险群图:以布宜诺斯艾利斯为例 (Map Risk Clusters of Neighbourhoods in the time of Pandemic: a case of Buenos Aires) 介绍 (Introduction) Every year is unique and particular. But, 2020 brought the world the …

荷兰牛栏 荷兰售价_荷兰的公路货运是如何发展的

荷兰牛栏 荷兰售价I spent hours daily driving on one of the busiest motorways in the Netherlands when commuting was still a norm. When I first came across with the goods vehicle data on CBS website, it immediately attracted my attention: it could answer tho…

Vim 行号的显示与隐藏

2019独角兽企业重金招聘Python工程师标准>>> Vim 行号的显示与隐藏 一、当前文档的显示与隐藏 1 打开一个文档 [rootpcname ~]# vim demo.txt This is the main Apache HTTP server configuration file. It contains the configuration directives that give the s…

结对项目-小学生四则运算系统网页版项目报告

结对作业搭档:童宇欣 本篇博客结构一览: 1).前言(包括仓库地址等项目信息) 2).开始前PSP展示 3).结对编程对接口的设计 4).计算模块接口的设计与实现过程 5).计算模块接口部分的性能改进 6&…

袁中的第三次作业

第一题: 输出月份英文名 设计思路: 1:看题目:主函数与函数声明,知道它要你干什么2:理解与分析:在main中,给你一个月份数字n,要求你通过调用函数char *getmonth,来判断:若…

Python从菜鸟到高手(1):初识Python

1 Python简介 1.1 什么是Python Python是一种面向对象的解释型计算机程序设计语言,由荷兰人吉多范罗苏姆(Guido van Rossum)于1989年发明,第一个公开发行版发行于1991年。目前Python的最新发行版是Python3.6。 Python是纯粹的自由…

如何成为数据科学家_成为数据科学家需要了解什么

如何成为数据科学家Data science is one of the new, emerging fields that has the power to extract useful trends and insights from both structured and unstructured data. It is an interdisciplinary field that uses scientific research, algorithms, and graphs to…

阿里云对数据可靠性保障的一些思考

背景互联网时代的数据重要性不言而喻,任何数据的丢失都会给企事业单位、政府机关等造成无法计算和无法弥补的损失,尤其随着云计算和大数据时代的到来,数据中心的规模日益增大,环境更加复杂,云上客户群体越来越庞大&…

linux实验二

南京信息工程大学实验报告 实验名称 linux 常用命令练习 实验日期 2018-4-4 得分指导教师 系 计软院 专业 软嵌 年级 2015 级 班次 (1) 姓名王江远 学号20151398006 一、实验目的 1. 掌握 linux 系统中 shell 的基础知识 2. 掌握 linux 系统中文件系统的…

个人项目api接口_5个免费有趣的API,可用于学习个人项目等

个人项目api接口Public APIs are awesome!公共API很棒! There are over 50 pieces covering APIs on just the Towards Data Science publication, so I won’t go into too lengthy of an introduction. APIs basically let you interact with some tool or servi…

咕泡-模板方法 template method 设计模式笔记

2019独角兽企业重金招聘Python工程师标准>>> 模板方法模式(Template Method) 定义一个操作中的算法的骨架,而将一些步骤延迟到子类中Template Method 使得子类可以不改变一个算法的结构即可重定义该算法的某些特定步骤Template Me…

如何评价强gis与弱gis_什么是gis的简化解释

如何评价强gis与弱gisTL;DR — A Geographic Information System is an information system that specializes in the storage, retrieval and display of location data.TL; DR — 地理信息系统 是专门从事位置数据的存储,检索和显示的信息系统。 The standard de…

Scrum冲刺-Ⅳ

第四次冲刺任务 团队分工 成员:刘鹏芝,罗樟,王小莉,沈兴艳,徐棒,彭康明,胡广键 产品用户:王小莉 需求规约:彭康明,罗樟 UML:刘鹏芝,沈…

机器人影视对接_机器学习对接会

机器人影视对接A simple question like ‘How do you find a compatible partner?’ is what pushed me to try to do this project in order to find a compatible partner for any person in a population, and the motive behind this blog post is to explain my approach…

mysql 数据库优化之执行计划(explain)简析

数据库优化是一个比较宽泛的概念,涵盖范围较广。大的层面涉及分布式主从、分库、分表等;小的层面包括连接池使用、复杂查询与简单查询的选择及是否在应用中做数据整合等;具体到sql语句执行效率则需调整相应查询字段,条件字段&…

自我接纳_接纳预测因子

自我接纳现实世界中的数据科学 (Data Science in the Real World) Students are often worried and unaware about their chances of admission to graduate school. This blog aims to help students in shortlisting universities with their profiles using ML model. The p…

python中knn_如何在python中从头开始构建knn

python中knnk最近邻居 (k-Nearest Neighbors) k-Nearest Neighbors (KNN) is a supervised machine learning algorithm that can be used for either regression or classification tasks. KNN is non-parametric, which means that the algorithm does not make assumptions …

unity第三人称射击游戏_在游戏上第3部分完美的信息游戏

unity第三人称射击游戏Previous article上一篇文章 The economics literature distinguishes the quality of a game’s information (perfect vs. imperfect) from the completeness of a game’s information (complete vs. incomplete). Perfect information means that ev…

JVM(2)--一文读懂垃圾回收

与其他语言相比,例如c/c,我们都知道,java虚拟机对于程序中产生的垃圾,虚拟机是会自动帮我们进行清除管理的,而像c/c这些语言平台则需要程序员自己手动对内存进行释放。 虽然这种自动帮我们回收垃圾的策略少了一定的灵活…

2058. 找出临界点之间的最小和最大距离

2058. 找出临界点之间的最小和最大距离 链表中的 临界点 定义为一个 局部极大值点 或 局部极小值点 。 如果当前节点的值 严格大于 前一个节点和后一个节点,那么这个节点就是一个 局部极大值点 。 如果当前节点的值 严格小于 前一个节点和后一个节点,…