ols线性回归_普通最小二乘[OLS]方法使用于机器学习的简单线性回归变得容易

ols线性回归

Hello Everyone!

大家好!

I am super excited to be writing another article after a long time since my previous article was published.

自从上一篇文章发表很长时间以来,我很高兴能写另一篇文章。

A Simple Linear Regression [SLR] is basically this formula:

简单线性回归[SLR]基本上是以下公式:

which is spelled as y equals b zero plus b one times x one. I am sure you have seen this formula in your high school which was a part of drawing a line or sloped line in a x-y axis. Let’s move a step ahead and understand what each of these variables or coefficients mean in detail.

拼写为y等于b零加b乘以x一。 我确定您在高中时就已经看到了这个公式,这是在xy轴上绘制直线或倾斜线的一部分。 让我们前进一步,详细了解这些变量或系数的含义。

Image for post

What does y signify in the equation?

y在方程式中代表什么?

根据上式, y是因变量(DV),它是试图解释某些内容的变量,例如: (From the above equation, y is the dependent variable (DV), It is a variable which is trying to explain something, For Example:)

Hypothetically speaking Salary of an employee depends on the years of experience. In this case y that is the salary of an employee would be the dependent variable, since it is dependent on the years of experience.

假设地说,雇员的工资取决于经验的年限。 在这种情况下,作为雇员薪水的y将是因变量,因为它取决于经验的年限。

or let’s take another example where the marks scored by the student depends upon the number of hours spent for studying, again in this case y that is the marks scored would be the dependent variable, since it is dependent on the number of hours spent studying for the exam.

还是让我们再举一个例子,其中由学生取得的标志取决于花费在这种情况下Ÿ学习,再次小时数是进球将因变量的标记,因为它是依赖于所花时间为留学人数考试。

What does x i.e (x1) signify in the equation?

x ie(x1)在方程式中代表什么?

根据上述相同的方程,x是自变量(IV),在这里,在简单线性回归的情况下,我们只有一个自变量,即x1。 (From the same equation mentioned above, x is the independent variable (IV), here in case of Simple Linear Regression, we have only one independent variable i.e x1.)

This is the variable that is causing the dependent variable to change. From the example mentioned above the years of experience and number of hours spent studying are the independent variables.

这是导致因变量更改的变量。 从上面提到的例子中,多年的经验和学习时间是自变量。

What does b1 signify in the equation?

b1在方程式中代表什么?

Here, b1 is the coefficient for independent variable i.e x1. This variable(b1) actually decides how a unit change in x1 influences y. Think of it as a multiplier or a connector that connects x and y.

在此,b1是自变量的系数,即x1。 这个变量(b1)实际上决定x1的单位变化如何影响y。 可以将它视为连接x和y的乘法器或连接器。

and then finally comes b0, which is a constant which I will explain in detail in the later section of this article.

然后最后是b0,这是一个常量,我将在本文的后面部分中详细说明。

ünderstanding SLR与实施例: (Understanding SLR with an Example:)

Image for post

The basic example of Salary vs Years of Experience where Experience (Years of Experience) is in the x-axis and salary is in the y-axis. Our main goal here is to understand how salary is dependent upon the years of experience.Here we have the data of different employees who are working in different companies.

薪金与工作年数的基本示例,其中经验(年数)在x轴上,薪水在y轴上。 我们的主要目标是了解薪资如何取决于经验的年限。这里我们拥有在不同公司工作的不同员工的数据。

This is how the Simple Linear Regression formula can be related to the above example:

这就是简单线性回归公式与上面的示例相关的方式:

Image for post

The above formula can be read as Salary equals b zero plus b1 times experience. So what it essentially means is that it is putting a line through the above shown chart that best fits the data. I will explain about the best fitting line as we move ahead when I speak about Ordinary Least Square Method [OLS], but for now as you can see in the below mentioned picture the line that best fits the data.

上面的公式可以理解为薪水等于b零加b1乘以经验。 因此,这实际上意味着在上面显示的图表中划一条最适合数据的线。 当我谈论普通最小二乘法[OLS]时,我将解释最佳拟合线,但是现在,如下面的图片所示,您可以看到最适合数据的线。

Image for post

Let us focus on the coefficients b1 and a constant b0.

让我们关注系数b1和常数b0。

Image for post
Trying to understand b0, from the above mentioned example of Salary vs Experience
从上述薪金与经验的示例中尝试理解b0

The constant b0 is the point or value where the line intersects in the vertical axis i.e y-axis. Suppose let’s say b0 value is $30k, so when experience is 0, the second part of the equation i.e b1*experience becomes zero. That means salary = $30k. According to the model when a fresher joins a company his salary will be $30k.

常数b0是线在垂直轴(即y轴)上相交的点或值。 假设b0的值为$ 30k,那么当经验为0时,等式的第二部分,即b1 * experience变为零。 这意味着薪水= 3万美元。 根据该模型,当新人加入公司时,他的薪水将为3万美元。

Now, What is b1?

现在,b1是什么?

Image for post

b1 is the slope of the line, more money you get as experience increases more will be the value of b1. As you can see in the above image when you perform the projections as per the black dotted lines, for one year increment in the experience there is a increase of around $10k in salary.

b1是直线的斜率,随着经验的增加,您获得的更多金钱将成为b1的价值。 正如您在上图中所看到的,当按照黑色虚线执行投影时,在一年的经验积累中,薪水增加了大约1万美元。

If the coefficient b1 is less, then slope will be less and even the salary increment per year will be less, if the slope is more then the experience will yield more increase in the salary and Yes, that’s how a Simple Linear Regression works.

如果系数b1较小,则斜率将较小,甚至每年的薪金增量也将较小;如果斜率较大,则经验将使工资增加更多,是的,这就是简单线性回归的工作原理。

如何找出简单线性回归[SLR]的最佳拟合线? (How to find out the BEST FIT LINE FOR Simple Linear Regression [SLR]?)

The answer is by Ordinary Least Square[OLS] Method

答案是通过普通最小二乘法[OLS]

Now let’s try to understand how to find out the best fitting line or how SLR finds out that line for us.

现在,让我们尝试了解如何找到最佳拟合线,或者SLR如何为我们找到最佳拟合线。

Image for post

The above shown graph is the same graph which I explained earlier. We have got the red dots that depicts the actual observation, we also have the straight line that best fits the data. To understand the working of OLS method let’s do some modifications on the graph:

上面显示的图形与我之前解释的图形相同。 我们有描述实际观察结果的红点,还有最适合数据的直线。 为了了解OLS方法的工作原理,我们对图形进行一些修改:

Image for post

We draw straight lines which are perpendicular to the observations to the best fitting line and then let’s select one observation as shown below:

我们绘制垂直于观测值的直线到最佳拟合线,然后让我们选择一个观测值,如下所示:

Image for post

Now you can see from the above picture that the red dot is the salary of a person for a particular year of experience. Let’s assume for 5 years of experience the salary is $50k. The model line, the blue line actually tells us what actually that person should get in terms of salary based on that data in generalized way. Let’s say he should earn $40K for 5 years of experience which is indicated by the green dot on the line.

现在,从上图可以看到, 红点是一个人在特定年份的薪水。 假设有5年的工作经验,工资是$ 50k。 模型行,蓝色行实际上告诉我们,根据该数据,该人员应以概括的方式实际获得的薪水是多少。 假设他应该在5年的经验中赚到$ 40K,这由行上的绿点表示。

Image for post

Next, let’s call the red dot as yi that is the actual observation and green dot is called yi^(also called yi hat) which is the observation/value which the model is trying to predict and the blue dotted line is the difference between what the employee is actually earning and what he/she should be earning according to the model. In general, blue dotted line is the difference between the observed and the modeled.

接下来,我们将红色点称为yi,这是实际的观测值,将绿色点称为yi ^(也称为yi hat),这是模型试图预测的观测值/值,蓝色虚线是两者之间的差。员工实际赚取的收入以及根据模型应获得的收入。 通常,蓝色虚线是观察到的和建模之间的差异。

To get this best fitting line, what is done is that we take the sum of (yi-yi^)², take the value of each one of those dotted blue lines, we square them and then wetake sum of those squares, once we have the sum of those squares we find out the minimum of them.

为了获得最佳拟合线,要做的是我们取(yi-yi ^)²的总和,取每条虚线蓝色线的值,将它们平方,然后取这些平方的和。有那些平方的和,我们找出它们的最小值。

So, what a SLR does is that it draws lots and lots of these lines just like this:

因此,SLR要做的就是绘制很多这样的线条,如下所示:

Image for post

and then finds a line which has minimum sum of squares of (yi-yi^) and that line is the best fitting line and the method followed to find out this line is called as the Ordinary least square [OLS] method.

然后找到一条具有(yi-yi ^)的最小平方和的线,并且该线是最佳拟合线,并且为了找出该线而遵循的方法称为“普通最小二乘[OLS]”方法。

Image for post

I hope you found this article useful.

希望本文对您有所帮助。

Thank you so much!

非常感谢!

Feel free to connect with me either through LinkedIn, Instagram or Facebook.

随时通过LinkedIn , Instagram或Facebook与我联系。

I will be back with one more exciting article! Till then Stay Safe.

我还会再来一篇精彩的文章! 直到安全。

Cheers!

干杯!

Arnold Sachith

阿诺德·萨希斯(Arnold Sachith)

翻译自: https://medium.com/analytics-vidhya/simple-linear-regression-for-machine-learning-made-easy-with-ordinary-least-square-ols-method-65e1240cf835

ols线性回归

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/391778.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Amazon Personalize:帮助释放精益数字业务的高级推荐解决方案的功能

By Gerd Wittchen盖德维琴 推荐解决方案的动机 (Motivation for recommendation solutions) Rapid changes in customer behaviour requires businesses to adapt at an ever increasing pace. The recent changes to our work and personal life has forced entire nations t…

Linux 链接文件讲解

链接文件是Linux文件系统的一个优势。如需要在系统上维护同一文件的两份或者多份副本,除了保存多份单独的物理文件之外,可以采用保留一份物理文件副本和多个虚拟副本的方式,这种虚拟的副本就成为链接。链接是目录中指向文件真实位置的占位符。…

系统滚动条实现的NUD控件Unusable版

昨天研究了一下系统滚动条,准备使用它来实现一个NumericUpDown控件,因为它可以带来最正宗的微调按钮外观,并说了一下可以使用viewport里的onScroll事件来获取系统滚动条的上下点击动作。 同时昨天还说了onScroll事件的一个问题是&#xf…

[习题].FindControl()方法 与 PlaceHolder控件 #2(动态加入「子控件」的事件)

这是我的文章备份,有空请到我的网站走走, http://www.dotblogs.com.tw/mis2000lab/ 才能掌握我提供的第一手信息,谢谢您。 http://www.dotblogs.com.tw/mis2000lab/archive/2011/07/26/placeholder_findcontrol_eventhandler.aspx [习题].Fi…

西雅图治安_数据科学家对西雅图住宿业务的分析

西雅图治安介绍 (Introduction) Airbnb provides an online platform for hosts to accommodate guests with short-term lodging. Guests can search for lodging using filters such as lodging type, dates, location, and price, and can search for specific types of hom…

【贪心】买卖股票的最佳时机含手续费

/** 贪心:每次选取更低的价格买入,遇到高于买入的价格就出售(此时不一定是最大收益)。* 使用buy表示买入股票的价格和手续费的和。遍历数组,如果后面的股票价格加上手续费* 小于buy,说明有更低的买入价格更新buy。如…

排序算法Java代码实现(二)—— 冒泡排序

本篇内容: 冒泡排序冒泡排序 算法思想: 冒泡排序的原理是:从左到右,相邻元素进行比较。 每次比较一轮,就会找到序列中最大的一个或最小的一个。这个数就会从序列的最右边冒出来。 代码实现: /*** */ packag…

创意产品 分析_使用联合分析来发展创意

创意产品 分析Advertising finds itself in a tenacious spot these days serving two masters: creativity and data.如今,广告业处于一个顽强的位置,服务于两个大师:创造力和数据。 On the one hand, it values creativity; and it’s not…

vue.js 安装

写 一个小小的安装步骤 踩坑过来的 点击.然后安装cnpm.再接着使用文章说明继续安装 # 全局安装 vue-cli $ cnpm install --global vue-cli # 创建一个基于 webpack 模板的新项目 $ vue init webpack my-project这时候一路空格 选项.当遇到第一个让你敲 Y/N 的时候 选择Y …

pandas之表格样式

在juoyter notebook中直接通过df输出DataFrame时&#xff0c;显示的样式为表格样式&#xff0c;通过sytle可对表格的样式做一些定制&#xff0c;类似excel的条件格式。 df pd.DataFrame(np.random.rand(5,4),columns[A,B,C,D]) s df.style print(s,type(s)) #<pandas.io.f…

多层感知机 深度神经网络_使用深度神经网络和合同感知损失的能源产量预测...

多层感知机 深度神经网络in collaboration with Hsu Chung Chuan, Lin Min Htoo, and Quah Jia Yong.与许忠传&#xff0c;林敏涛和华佳勇合作。 1. Introduction1.简介 Since the early 1990s, several countries, mostly in the European Union and North America, had sta…

蓝牙调试工具如何使用_使用此有价值的工具改进您的蓝牙项目:第2部分!

蓝牙调试工具如何使用This post is originally from www.jaredwolff.com. 这篇文章最初来自www.jaredwolff.com。 This is Part 2 of configuring your own Bluetooth Low Energy Service using a Nordic NRF52 series processor. If you haven’t seen Part 1 go back and ch…

使用Matplotlib Numpy Pandas构想泰坦尼克号高潮

Did you know, a novel predicted the Titanic sinking 14 years previously to the actual disaster???您知道吗&#xff0c;一本小说预言泰坦尼克号在14年前沉没到了真正的灾难中&#xff1f;&#xff1f;&#xff1f; In 1898 (14 years before the Titanic sank), Amer…

pca数学推导_PCA背后的统计和数学概念

pca数学推导As I promised in the previous article, Principal Component Analysis (PCA) with Scikit-learn, today, I’ll discuss the mathematics behind the principal component analysis by manually executing the algorithm using the powerful numpy and pandas lib…

红黑树分析

红黑树的性质&#xff1a; 性质1&#xff1a;每个节点要么是黑色&#xff0c;要么是红色。 性质2&#xff1a;根节点是黑色。性质3&#xff1a;每个叶子节点&#xff08;NIL&#xff09;是黑色。性质4&#xff1a;每个红色节点的两个子节点一定都是黑色。不能有两个红色节点相…

overlay 如何实现跨主机通信?- 每天5分钟玩转 Docker 容器技术(52)

上一节我们在 host1 中运行了容器 bbox1&#xff0c;今天将详细讨论 overlay 网络跨主机通信的原理。 在 host2 中运行容器 bbox2&#xff1a; bbox2 IP 为 10.0.0.3&#xff0c;可以直接 ping bbox1&#xff1a; 可见 overlay 网络中的容器可以直接通信&#xff0c;同时 docke…

Python:实现图片裁剪的两种方式——Pillow和OpenCV

原文&#xff1a;https://blog.csdn.net/hfutdog/article/details/82351549 在这篇文章里我们聊一下Python实现图片裁剪的两种方式&#xff0c;一种利用了Pillow&#xff0c;还有一种利用了OpenCV。两种方式都需要简单的几行代码&#xff0c;这可能也就是现在Python那么流行的原…

鼠标移动到ul图片会摆动_我们可以从摆动时序分析中学到的三件事

鼠标移动到ul图片会摆动An opportunity for a new kind of analysis of Major League Baseball data may be upon us soon. Here’s how we can prepare.不久之后&#xff0c;我们将有机会对美国职棒大联盟数据进行新的分析。 这是我们准备的方法。 It is tempting to think t…

回到网易后开源APM技术选型与实战

篇幅一&#xff1a;APM基础篇\\1、什么是APM?\\APM&#xff0c;全称&#xff1a;Application Performance Management &#xff0c;目前市面的系统基本都是参考Google的Dapper&#xff08;大规模分布式系统的跟踪系统&#xff09;来做的&#xff0c;翻译传送门《google的Dappe…

如何选择优化算法遗传算法_用遗传算法优化垃圾收集策略

如何选择优化算法遗传算法Genetic Algorithms are a family of optimisation techniques that loosely resemble evolutionary processes in nature. It may be a crude analogy, but if you squint your eyes, Darwin’s Natural Selection does roughly resemble an optimisa…