ols线性回归_普通最小二乘[OLS]方法使用于机器学习的简单线性回归变得容易

ols线性回归

Hello Everyone!

大家好!

I am super excited to be writing another article after a long time since my previous article was published.

自从上一篇文章发表很长时间以来,我很高兴能写另一篇文章。

A Simple Linear Regression [SLR] is basically this formula:

简单线性回归[SLR]基本上是以下公式:

which is spelled as y equals b zero plus b one times x one. I am sure you have seen this formula in your high school which was a part of drawing a line or sloped line in a x-y axis. Let’s move a step ahead and understand what each of these variables or coefficients mean in detail.

拼写为y等于b零加b乘以x一。 我确定您在高中时就已经看到了这个公式,这是在xy轴上绘制直线或倾斜线的一部分。 让我们前进一步,详细了解这些变量或系数的含义。

Image for post

What does y signify in the equation?

y在方程式中代表什么?

根据上式, y是因变量(DV),它是试图解释某些内容的变量,例如: (From the above equation, y is the dependent variable (DV), It is a variable which is trying to explain something, For Example:)

Hypothetically speaking Salary of an employee depends on the years of experience. In this case y that is the salary of an employee would be the dependent variable, since it is dependent on the years of experience.

假设地说,雇员的工资取决于经验的年限。 在这种情况下,作为雇员薪水的y将是因变量,因为它取决于经验的年限。

or let’s take another example where the marks scored by the student depends upon the number of hours spent for studying, again in this case y that is the marks scored would be the dependent variable, since it is dependent on the number of hours spent studying for the exam.

还是让我们再举一个例子,其中由学生取得的标志取决于花费在这种情况下Ÿ学习,再次小时数是进球将因变量的标记,因为它是依赖于所花时间为留学人数考试。

What does x i.e (x1) signify in the equation?

x ie(x1)在方程式中代表什么?

根据上述相同的方程,x是自变量(IV),在这里,在简单线性回归的情况下,我们只有一个自变量,即x1。 (From the same equation mentioned above, x is the independent variable (IV), here in case of Simple Linear Regression, we have only one independent variable i.e x1.)

This is the variable that is causing the dependent variable to change. From the example mentioned above the years of experience and number of hours spent studying are the independent variables.

这是导致因变量更改的变量。 从上面提到的例子中,多年的经验和学习时间是自变量。

What does b1 signify in the equation?

b1在方程式中代表什么?

Here, b1 is the coefficient for independent variable i.e x1. This variable(b1) actually decides how a unit change in x1 influences y. Think of it as a multiplier or a connector that connects x and y.

在此,b1是自变量的系数,即x1。 这个变量(b1)实际上决定x1的单位变化如何影响y。 可以将它视为连接x和y的乘法器或连接器。

and then finally comes b0, which is a constant which I will explain in detail in the later section of this article.

然后最后是b0,这是一个常量,我将在本文的后面部分中详细说明。

ünderstanding SLR与实施例: (Understanding SLR with an Example:)

Image for post

The basic example of Salary vs Years of Experience where Experience (Years of Experience) is in the x-axis and salary is in the y-axis. Our main goal here is to understand how salary is dependent upon the years of experience.Here we have the data of different employees who are working in different companies.

薪金与工作年数的基本示例,其中经验(年数)在x轴上,薪水在y轴上。 我们的主要目标是了解薪资如何取决于经验的年限。这里我们拥有在不同公司工作的不同员工的数据。

This is how the Simple Linear Regression formula can be related to the above example:

这就是简单线性回归公式与上面的示例相关的方式:

Image for post

The above formula can be read as Salary equals b zero plus b1 times experience. So what it essentially means is that it is putting a line through the above shown chart that best fits the data. I will explain about the best fitting line as we move ahead when I speak about Ordinary Least Square Method [OLS], but for now as you can see in the below mentioned picture the line that best fits the data.

上面的公式可以理解为薪水等于b零加b1乘以经验。 因此,这实际上意味着在上面显示的图表中划一条最适合数据的线。 当我谈论普通最小二乘法[OLS]时,我将解释最佳拟合线,但是现在,如下面的图片所示,您可以看到最适合数据的线。

Image for post

Let us focus on the coefficients b1 and a constant b0.

让我们关注系数b1和常数b0。

Image for post
Trying to understand b0, from the above mentioned example of Salary vs Experience
从上述薪金与经验的示例中尝试理解b0

The constant b0 is the point or value where the line intersects in the vertical axis i.e y-axis. Suppose let’s say b0 value is $30k, so when experience is 0, the second part of the equation i.e b1*experience becomes zero. That means salary = $30k. According to the model when a fresher joins a company his salary will be $30k.

常数b0是线在垂直轴(即y轴)上相交的点或值。 假设b0的值为$ 30k,那么当经验为0时,等式的第二部分,即b1 * experience变为零。 这意味着薪水= 3万美元。 根据该模型,当新人加入公司时,他的薪水将为3万美元。

Now, What is b1?

现在,b1是什么?

Image for post

b1 is the slope of the line, more money you get as experience increases more will be the value of b1. As you can see in the above image when you perform the projections as per the black dotted lines, for one year increment in the experience there is a increase of around $10k in salary.

b1是直线的斜率,随着经验的增加,您获得的更多金钱将成为b1的价值。 正如您在上图中所看到的,当按照黑色虚线执行投影时,在一年的经验积累中,薪水增加了大约1万美元。

If the coefficient b1 is less, then slope will be less and even the salary increment per year will be less, if the slope is more then the experience will yield more increase in the salary and Yes, that’s how a Simple Linear Regression works.

如果系数b1较小,则斜率将较小,甚至每年的薪金增量也将较小;如果斜率较大,则经验将使工资增加更多,是的,这就是简单线性回归的工作原理。

如何找出简单线性回归[SLR]的最佳拟合线? (How to find out the BEST FIT LINE FOR Simple Linear Regression [SLR]?)

The answer is by Ordinary Least Square[OLS] Method

答案是通过普通最小二乘法[OLS]

Now let’s try to understand how to find out the best fitting line or how SLR finds out that line for us.

现在,让我们尝试了解如何找到最佳拟合线,或者SLR如何为我们找到最佳拟合线。

Image for post

The above shown graph is the same graph which I explained earlier. We have got the red dots that depicts the actual observation, we also have the straight line that best fits the data. To understand the working of OLS method let’s do some modifications on the graph:

上面显示的图形与我之前解释的图形相同。 我们有描述实际观察结果的红点,还有最适合数据的直线。 为了了解OLS方法的工作原理,我们对图形进行一些修改:

Image for post

We draw straight lines which are perpendicular to the observations to the best fitting line and then let’s select one observation as shown below:

我们绘制垂直于观测值的直线到最佳拟合线,然后让我们选择一个观测值,如下所示:

Image for post

Now you can see from the above picture that the red dot is the salary of a person for a particular year of experience. Let’s assume for 5 years of experience the salary is $50k. The model line, the blue line actually tells us what actually that person should get in terms of salary based on that data in generalized way. Let’s say he should earn $40K for 5 years of experience which is indicated by the green dot on the line.

现在,从上图可以看到, 红点是一个人在特定年份的薪水。 假设有5年的工作经验,工资是$ 50k。 模型行,蓝色行实际上告诉我们,根据该数据,该人员应以概括的方式实际获得的薪水是多少。 假设他应该在5年的经验中赚到$ 40K,这由行上的绿点表示。

Image for post

Next, let’s call the red dot as yi that is the actual observation and green dot is called yi^(also called yi hat) which is the observation/value which the model is trying to predict and the blue dotted line is the difference between what the employee is actually earning and what he/she should be earning according to the model. In general, blue dotted line is the difference between the observed and the modeled.

接下来,我们将红色点称为yi,这是实际的观测值,将绿色点称为yi ^(也称为yi hat),这是模型试图预测的观测值/值,蓝色虚线是两者之间的差。员工实际赚取的收入以及根据模型应获得的收入。 通常,蓝色虚线是观察到的和建模之间的差异。

To get this best fitting line, what is done is that we take the sum of (yi-yi^)², take the value of each one of those dotted blue lines, we square them and then wetake sum of those squares, once we have the sum of those squares we find out the minimum of them.

为了获得最佳拟合线,要做的是我们取(yi-yi ^)²的总和,取每条虚线蓝色线的值,将它们平方,然后取这些平方的和。有那些平方的和,我们找出它们的最小值。

So, what a SLR does is that it draws lots and lots of these lines just like this:

因此,SLR要做的就是绘制很多这样的线条,如下所示:

Image for post

and then finds a line which has minimum sum of squares of (yi-yi^) and that line is the best fitting line and the method followed to find out this line is called as the Ordinary least square [OLS] method.

然后找到一条具有(yi-yi ^)的最小平方和的线,并且该线是最佳拟合线,并且为了找出该线而遵循的方法称为“普通最小二乘[OLS]”方法。

Image for post

I hope you found this article useful.

希望本文对您有所帮助。

Thank you so much!

非常感谢!

Feel free to connect with me either through LinkedIn, Instagram or Facebook.

随时通过LinkedIn , Instagram或Facebook与我联系。

I will be back with one more exciting article! Till then Stay Safe.

我还会再来一篇精彩的文章! 直到安全。

Cheers!

干杯!

Arnold Sachith

阿诺德·萨希斯(Arnold Sachith)

翻译自: https://medium.com/analytics-vidhya/simple-linear-regression-for-machine-learning-made-easy-with-ordinary-least-square-ols-method-65e1240cf835

ols线性回归

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/391778.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

ubuntu安装配置jdk

先去 Oracle下载Linux下的JDK压缩包,我下载的是jdk-7u4-linux-i586.tar.gz文件,下好后直接解压Step1:# 将解压好的jdk1.7.0_04文件夹用最高权限复制到/usr/lib/jvm目录里sudo cp -r ~/jdk1.7.0_04/ /usr/lib/jvm/Step2:# 配置环境变量sudo gedit ~/.prof…

leetcode 697. 数组的度(hashmap)

给定一个非空且只包含非负数的整数数组 nums,数组的度的定义是指数组里任一元素出现频数的最大值。 你的任务是在 nums 中找到与 nums 拥有相同大小的度的最短连续子数组,返回其长度。 示例 1: 输入:[1, 2, 2, 3, 1] 输出&…

facebook机器学习_如何为您的页面创建Facebook Messenger机器人

facebook机器学习by Paul Pinard保罗皮纳德(Paul Pinard) 如何为您的页面创建Facebook Messenger机器人 (How to create a Facebook messenger bot for your page) When it comes to sharing your chatbot, Facebook Messenger is a must. We created a very easy step-by-ste…

Logstash配置语法及相关命令

配置结构以及插件位置 输入插件: input{ … } 过滤插件: filter{ … } 输出插件: output{ … } 数据类型 - Array users > [{id > 1,name > N1},{id > 2,name > N2}] - lists path > ["/var/log/messages"…

面试整理

SpringMVC 和Struts2的区别 1. 机制: spring mvc的入口是servlet,而struts2是filter,这样就导致了二者的机制不同。 2. 性能: spring会稍微比struts快。spring mvc是基于方法的设计,而sturts 是基于类,…

Amazon Personalize:帮助释放精益数字业务的高级推荐解决方案的功能

By Gerd Wittchen盖德维琴 推荐解决方案的动机 (Motivation for recommendation solutions) Rapid changes in customer behaviour requires businesses to adapt at an ever increasing pace. The recent changes to our work and personal life has forced entire nations t…

Linux 链接文件讲解

链接文件是Linux文件系统的一个优势。如需要在系统上维护同一文件的两份或者多份副本,除了保存多份单独的物理文件之外,可以采用保留一份物理文件副本和多个虚拟副本的方式,这种虚拟的副本就成为链接。链接是目录中指向文件真实位置的占位符。…

系统滚动条实现的NUD控件Unusable版

昨天研究了一下系统滚动条,准备使用它来实现一个NumericUpDown控件,因为它可以带来最正宗的微调按钮外观,并说了一下可以使用viewport里的onScroll事件来获取系统滚动条的上下点击动作。 同时昨天还说了onScroll事件的一个问题是&#xf…

react 中渲染html_如何在React中识别和解决浪费的渲染

react 中渲染htmlby Nayeem Reza通过Nayeem Reza 如何在React中识别和解决浪费的渲染 (How to identify and resolve wasted renders in React) So, recently I was thinking about performance profiling of a react app that I was working on, and suddenly thought to set…

php变量的数据类型

一、类型 标量类型: 布尔型 整型 浮点型 字符串 复合类型: 数组 对象 特殊类型: 资源 null 1. 布尔型 true false 以下值认为是false 其他值都认为是true; 布尔值false 整型值0 浮点的0 空字符串和字符串0 空数组 空对象(只适用于php4) 特殊类型null 2. 整型 正整数和负整…

[习题].FindControl()方法 与 PlaceHolder控件 #2(动态加入「子控件」的事件)

这是我的文章备份,有空请到我的网站走走, http://www.dotblogs.com.tw/mis2000lab/ 才能掌握我提供的第一手信息,谢谢您。 http://www.dotblogs.com.tw/mis2000lab/archive/2011/07/26/placeholder_findcontrol_eventhandler.aspx [习题].Fi…

西雅图治安_数据科学家对西雅图住宿业务的分析

西雅图治安介绍 (Introduction) Airbnb provides an online platform for hosts to accommodate guests with short-term lodging. Guests can search for lodging using filters such as lodging type, dates, location, and price, and can search for specific types of hom…

leetcode 1438. 绝对差不超过限制的最长连续子数组(滑动窗口+treemap)

给你一个整数数组 nums ,和一个表示限制的整数 limit,请你返回最长连续子数组的长度,该子数组中的任意两个元素之间的绝对差必须小于或者等于 limit 。 如果不存在满足条件的子数组,则返回 0 。 示例 1: 输入&#…

react-redux图解_如何将React连接到Redux —图解指南

react-redux图解by Princiya由Princiya 如何将React连接到Redux —图解指南 (How to connect React to Redux — a diagrammatic guide) This post is aimed at people who already know React and Redux. This will aid them in better understanding how things work under …

几种机器学习算法的优缺点

1决策树(Decision Trees)的优缺点 决策树的优点: 一、 决策树易于理解和解释.人们在通过解释后都有能力去理解决策树所表达的意义。 二、 对于决策树,数据的准备往往是简单或者是不必要的.不需要预处理数据。…

【贪心】买卖股票的最佳时机含手续费

/** 贪心:每次选取更低的价格买入,遇到高于买入的价格就出售(此时不一定是最大收益)。* 使用buy表示买入股票的价格和手续费的和。遍历数组,如果后面的股票价格加上手续费* 小于buy,说明有更低的买入价格更新buy。如…

本科毕设论文——基于Kinect的拖拉机防撞系统

基于Kinect的拖拉机防撞系统电子信息科学与技术专业学生 sukeysun 摘要:随着智能车辆技术的发展,智能导航定位和实时车载监控等技术被更多的应用到日常生活照。在农业领域上,车辆自主感知道路环境并制定实时避障策略还存在不足,特…

排序算法Java代码实现(二)—— 冒泡排序

本篇内容: 冒泡排序冒泡排序 算法思想: 冒泡排序的原理是:从左到右,相邻元素进行比较。 每次比较一轮,就会找到序列中最大的一个或最小的一个。这个数就会从序列的最右边冒出来。 代码实现: /*** */ packag…

创意产品 分析_使用联合分析来发展创意

创意产品 分析Advertising finds itself in a tenacious spot these days serving two masters: creativity and data.如今,广告业处于一个顽强的位置,服务于两个大师:创造力和数据。 On the one hand, it values creativity; and it’s not…

leetcode 剑指 Offer 05. 替换空格

请实现一个函数,把字符串 s 中的每个空格替换成"%20"。 示例 1: 输入:s “We are happy.” 输出:“We%20are%20happy.” 解题思路 一次遍历,检查空格,然后替换 代码 class Solution {publ…