皮尔逊相关性_皮尔逊的相关性及其在机器学习中的意义

皮尔逊相关性

Today we would be using a statistical concept i.e. Pearson's correlation to help us understand the relationships between the feature values (independent values) and the target value (dependent value or the value to be predicted ) which will further help us in improving our model’s efficiency.

今天,我们将使用统计概念(即Pearson的相关性)来帮助我们理解特征值(独立值)与目标值(独立值或要预测的值)之间的关系,这将进一步帮助我们提高模型的效率。

Mathematically pearson's correlation is calculated as:

在数学上, 皮尔逊的相关性计算如下:

pearson's correlation

Image source: https://businessjargons.com/wp-content/uploads/2016/04/Karl-Pearson-final.jpg

图片来源: https : //businessjargons.com/wp-content/uploads/2016/04/Karl-Pearson-final.jpg

So now the question arises, what should be stored in the variable X and what should be stored in variable Y. We generally store the feature values in X and target value in the Y. The formula written above will tell us whether there exists any correlation between the selected feature value and the target value.

所以现在出现了一个问题,什么应该存储在变量X中,什么应该存储在变量Y中。我们通常将特征值存储在X中,将目标值存储在Y中。上面写的公式将告诉我们是否存在任何相关性在所选特征值和目标值之间。

Before we code there are few basic things that we should keep in mind about correlation:

在进行编码之前,关于关联我们应该牢记一些基本的知识:

  • The value of Correlation will always lie between 1 and -1

    关联的值将始终在1到-1之间

  • Correlation=0, it means there is absolutely no relationship between the selected feature value and the target value.

    Correlation = 0 ,这意味着所选特征值和目标值之间绝对没有关系。

  • Correlation=1, it means that there is a perfect relationship between the selected feature value and the target value and this would mean that the selected feature is appropriate for our model to learn.

    Correlation = 1 ,表示所选特征值与目标值之间存在完美的关系,这意味着所选特征适合我们的模型学习。

  • Correlation=-1, it means that there exists a negative relationship between the selected feature value and the target value, generally, the use of the feature value having a negative value of low magnitude is discouraged for e.g. -0.1 0r -0.2.

    Correlation = -1 ,意味着在所选择的特征值与目标值之间存在负的关系,通常,对于例如-0.1 0r -0.2,不鼓励使用具有低幅度的负值的特征值。

So, guys let us now write the code to implement that we have just learned:

所以,伙计们让我们现在编写代码以实现刚刚学习的代码:

The data set used can be downloaded from here: headbrain3.CSV

可以从此处下载使用的数据集: headbrain3.CSV

"""
# -*- coding: utf-8 -*-
"""
Created on Sun Jul 29 22:21:12 2018
@author: Raunak Goswami
"""
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
"""
#reading the data
"""
here the directory of my code and the headbrain3.csv file 
is same make sure both the files are stored in same folder
or directory
""" 
data=pd.read_csv('headbrain3.csv')
#this will show the first five records of the whole data
data.head()
w=data.iloc[:,0:1].values
y=data.iloc[:,1:2].values
#this will create a variable x which has the feature values i.e head size
x=data.iloc[:,2:3].values
#this will create a variable y which has the target value i.e brain weight
z=data.iloc[:,3:4].values 
print(round(data['Gender'].corr(data['Brain Weight(grams)'])))          
plt.scatter(w,z,c='red')
plt.title('scattered graph for coorelation between Gender and brainweight' )
plt.xlabel('age')
plt.ylabel('brain weight')
plt.show()
print(round(data['Age Range'].corr(data['Brain Weight(grams)'])))          
plt.scatter(x,z,c='red')
plt.title('scattered graph for coorelation between age and brainweight' )
plt.xlabel('age range')
plt.ylabel('brain weight')
plt.show()
print(round((data['Head Size(cm^3)'].corr(data['Brain Weight(grams)']))))         
plt.scatter(x,z,c='red')
plt.title('scattered graph for coorelation between head size and brainweight' )
plt.xlabel('head size')
plt.ylabel('brain weight')
plt.show()
data.info()
data['Head Size(cm^3)'].corr(data['Brain Weight(grams)'])
k=data.corr()
print("The table for all possible values of pearson's coefficients is as follows")
print(k)

After you run your code in Spyder tool provided by anaconda distribution just go to your variable explorer and search for the variable named as k and double-click to see the values in that variable and you’ll see something like this

在anaconda发行版提供的Spyder工具中运行代码之后,转到变量资源管理器并搜索名为k的变量,然后双击以查看该变量中的值,您将看到类似以下的内容

k-dataframe

The table above shows the correlation values here 1 means perfect correlation,0 is for no correlation and -1 stands for negative correlation.

上表显示了相关值,此处1表示完全相关,0表示无相关,-1表示负相关。

Now let us understand these values using the graphs:

现在,让我们使用图形来了解这些值:

scattered graph 4

The reason for getting this abruptly looking graph is that there is no correlation between gender and brain weight, that is why we cannot use gender as a feature value in our prediction model.Let us try drawing graph for brain weight using another feature value, what about head size?

得到这张看起来很突然的图的原因是性别和大脑重量之间没有相关性,这就是为什么我们不能在预测模型中使用性别作为特征值的原因。让我们尝试使用另一个特征值绘制大脑重量的图关于头的大小?

scattered graph 4

As you can see in the table, there exists a perfect correlation between between brain weight and head size so as a result we a getting a definite graph this signifies that there exists a perfect linear relationship between brain weight and head size so we can use head size as one of the feature value in our model.

如您在表格中所见,大脑重量和头部大小之间存在完美的关联,因此,我们得到一个确定的图,这表明大脑重量和头部大小之间存在完美的线性关系,因此我们可以使用头部大小作为模型中的特征值之一。

That is all for this article if you have any queries just write in the comment section I would be happy to help you. Have a great day ahead, keep learning.

如果您有任何疑问,只需要在评论部分中编写,这就是本文的全部内容,我们很乐意为您提供帮助。 祝您有美好的一天,继续学习。

翻译自: https://www.includehelp.com/ml-ai/pearsons-correlation-and-its-implication-in-machine-learning.aspx

皮尔逊相关性

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/545966.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

磊哥最近面试了好多人,聊聊我的感受!(附面试知识点)

这是我的第 84 篇原创文章作者 | 王磊来源 | Java中文社群(ID:javacn666)转载请联系授权(微信ID:GG_Stone)一些读者可能知道,磊哥前段时间又回来上班了,详见《磊哥又滚回职场了...》…

M4 宏处理器

2019独角兽企业重金招聘Python工程师标准>>> M4 宏处理器 Brian W. KernighanDennis M. Ritchie Bell LaboratoriesMurray Hill, New Jersey 07974 翻译:寒蝉退士 译者声明:译者对译文不做任何担保,译者对译文不拥有任何权利并且不…

oracle的nvl和nvl2

NVL( string1, replace_with) 功能:如果string1为NULL,则NVL函数返回replace_with的值,否则返回string1的值,如果两个参数都为NULL ,则返回NULL。NVL2(expr1,expr2,expr3)功能:oracle中常用函数&#xff0c…

Java SecurityManager checkAwtEventQueueAccess()方法与示例

SecurityManager类的checkAwtEventQueueAccess()方法 (SecurityManager Class checkAwtEventQueueAccess() method) checkAwtEventQueueAccess() method is available in java.lang package. checkAwtEventQueueAccess()方法在java.lang包中可用。 checkAwtEventQueueAccess() …

绝了,几款主流的 JSON 库性能对比!

本篇通过JMH(Oracle官方测试框架)来测试一下Java中几种常见的JSON解析库的性能。每次都在网上看到别人说什么某某库性能是如何如何的好,碾压其他的库。但是百闻不如一见,只有自己亲手测试过的才是最值得相信的。JSON不管是在Web开…

DWZ使用笔记

DWZ使用笔记 一、前言 在近期的一个项目中,引入了DWZ这个富client框架,算是一次尝试吧。期间也遇到不少问题,总算一一攻克了。特以此文记之。本人用的是dwz-ria-1.4.5Asp.net webform,写这篇笔记时最新版本号已经是1.4.6了。DWZ官…

Dynamic_Performance_Tables_not_accessible_问题_解决不能动态统计

Dynamic Performance Tables not accessible, Automatic Statistics Disabled for this session You can disable statistics in the preference menu,or obtanin select priviliges on the v$session,v$sesstat and v$statname tables第一种处理方法(不推荐&#x…

ruby三元操作符_在Ruby中使用操作符将元素添加到数组实例中

ruby三元操作符In the previous articles, we have gone through ways through which we can create Array instances. Some of them were Public instance methods and some were Public class methods. We should also know how they both differ from each other. Now we kn…

阿里的简历多久可以投递一次?次数多了有没有影响?可以同时进行吗?

最近,无论是读者群,还是公众号后台,很多人都比较关注以下几个问题:阿里的简历是半年只能投递一次吗?阿里的面试可以多个部门同时进行吗?面试没过,又被系统捞起来了,我该怎么办&#…

c语言给定一个非空整数数组_C程序检查给定整数的所有位是否为一(1)

c语言给定一个非空整数数组Problem statement: Write a C Program to check if all the bits of a given integer is one (1). 问题陈述:编写一个C程序来检查给定整数的所有位是否都是一(1) 。 Solution: We can use bitwise operator here to solve the problem. …

记一次蚂蚁金服面试被虐经历

本文来自作者投稿,原作者:yes面试前的小姐姐来说说前不久蚂蚁金服一面的情况。说来也是巧合,当时在群里有位蚂蚁金服的小姐姐发了个内推,看了下JD感觉可以试试于是就私聊了小姐姐发简历内推了。我16年也就是大三上就开始实习了&am…

本地连接和音量图标显示

一种:“控制面板”——“声音和音频设备”——“将音量图标放入任务栏”曾经手动关掉了。只要打开就行了。 第二种:“将音量图标放入任务栏”打勾,无效。丢失sndvol32.exe,文件路径C:\WINDOWS\system32,可以在别的机子…

用python + openpyxl处理excel(07+)文档 + 一些中文处理的技巧

2019独角兽企业重金招聘Python工程师标准>>> 寻觅工具 确定任务之后第一步就是找个趁手的库来干活。 Python Excel上列出了xlrd、xlwt、xlutils这几个包,但是 它们都比较老,xlwt甚至不支持07版以后的excel它们的文档不太友好,都可…

Spring Boot 2.3.3 正式发布!

Spring Boot 2.3.3 稳定版已发布&#xff0c;可从 repo.spring.io 和 Maven Central 获取。<parent><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-parent</artifactId><version>2.3.3.RELEASE</versio…

木板最优切割利润最大_最多进行K笔交易的股票最大买卖利润

木板最优切割利润最大This is a very popular interview problem to find maximum profit in stock buying and selling with at most K transactions. This problem has been featured in the interview rounds of Amazon. 这是一个非常受欢迎的面试问题&#xff0c;目的是在最…

禁止访问磁盘的注册表

百度的了一个禁止访问磁盘的注册表问题怎么禁止访问磁盘&#xff0c;手动操作就会&#xff0c;可是有好几十台啊。手动搞&#xff0c;那个累啊。求个高手&#xff0c;帮我弄个注册表或者BAT文件执行都可以&#xff0c;禁止访问D盘跟E盘。网上找了很多资料&#xff0c;都叫用工具…

C语言文本文件与二进制文件转换

本程序要自己创建个文本格式的输入文件a1.txt&#xff0c;编译后能将文本文件前255字节以内的字符转换成相应的AscII码值的二进制表示&#xff0c;并存入输出文件a2.txt中。然后再将二进制文件还原并存入a3.txt文件。实现文件之间的转换。 具体代码如下&#xff1a; #include …

[数据库]Oracle和mysql中的分页总结

Mysql中的分页物理分页•在sql查询时&#xff0c;从数据库只检索分页需要的数据•通常不同的数据库有着不同的物理分页语句•mysql物理分页&#xff0c;采用limit关键字•例如&#xff1a;检索11-20条 select * from user limit 10,10 ;* 每次只查询10条记录.当点击下一页的时候…

List 集合去重的 3 种方法

问题由来在实际开发的时候&#xff0c;我们经常会碰到这么一个困难&#xff1a;一个集合容器里面有很多重复的对象&#xff0c;里面的对象没有主键&#xff0c;但是根据业务的需求&#xff0c;实际上我们需要根据条件筛选出没有重复的对象。比较暴力的方法&#xff0c;就是根据…

c printf 段错误_错误:预期声明在C中的printf之前指定

c printf 段错误The main cause of this error is - missing opening curly brace ({), before the printf() function. 导致此错误的主要原因是-在printf()函数之前缺少打开的花括号( { )。 Example: 例&#xff1a; #include <stdio.h>int main(void)printf("He…