ols线性回归

Hello Everyone!

大家好！

I am super excited to be writing another article after a long time since my previous article was published.

自从上一篇文章发表很长时间以来，我很高兴能写另一篇文章。

A Simple Linear Regression [SLR] is basically this formula:

简单线性回归[SLR]基本上是以下公式：

which is spelled as y equals b zero plus b one times x one. I am sure you have seen this formula in your high school which was a part of drawing a line or sloped line in a x-y axis. Let’s move a step ahead and understand what each of these variables or coefficients mean in detail.

拼写为y等于b零加b乘以x一。我确定您在高中时就已经看到了这个公式，这是在xy轴上绘制直线或倾斜线的一部分。让我们前进一步，详细了解这些变量或系数的含义。

What does y signify in the equation?
y在方程式中代表什么？

根据上式， y是因变量(DV)，它是试图解释某些内容的变量，例如： (From the above equation, y is the dependent variable (DV), It is a variable which is trying to explain something, For Example:)

Hypothetically speaking Salary of an employee depends on the years of experience. In this case y that is the salary of an employee would be the dependent variable, since it is dependent on the years of experience.

假设地说，雇员的工资取决于经验的年限。在这种情况下，作为雇员薪水的y将是因变量，因为它取决于经验的年限。

or let’s take another example where the marks scored by the student depends upon the number of hours spent for studying, again in this case y that is the marks scored would be the dependent variable, since it is dependent on the number of hours spent studying for the exam.

还是让我们再举一个例子，其中由学生取得的标志取决于花费在这种情况下Ÿ学习，再次小时数是进球将因变量的标记，因为它是依赖于所花时间为留学人数考试。

What does x i.e (x1) signify in the equation?
x ie(x1)在方程式中代表什么？

根据上述相同的方程，x是自变量(IV)，在这里，在简单线性回归的情况下，我们只有一个自变量，即x1。 (From the same equation mentioned above, x is the independent variable (IV), here in case of Simple Linear Regression, we have only one independent variable i.e x1.)

This is the variable that is causing the dependent variable to change. From the example mentioned above the years of experience and number of hours spent studying are the independent variables.

这是导致因变量更改的变量。从上面提到的例子中，多年的经验和学习时间是自变量。

What does b1 signify in the equation?
b1在方程式中代表什么？

Here, b1 is the coefficient for independent variable i.e x1. This variable(b1) actually decides how a unit change in x1 influences y. Think of it as a multiplier or a connector that connects x and y.

在此，b1是自变量的系数，即x1。这个变量(b1)实际上决定x1的单位变化如何影响y。可以将它视为连接x和y的乘法器或连接器。

and then finally comes b0, which is a constant which I will explain in detail in the later section of this article.

然后最后是b0，这是一个常量，我将在本文的后面部分中详细说明。

ünderstanding SLR与实施例： (Understanding SLR with an Example:)

The basic example of Salary vs Years of Experience where Experience (Years of Experience) is in the x-axis and salary is in the y-axis. Our main goal here is to understand how salary is dependent upon the years of experience.Here we have the data of different employees who are working in different companies.

薪金与工作年数的基本示例，其中经验(年数)在x轴上，薪水在y轴上。我们的主要目标是了解薪资如何取决于经验的年限。这里我们拥有在不同公司工作的不同员工的数据。

This is how the Simple Linear Regression formula can be related to the above example:

这就是简单线性回归公式与上面的示例相关的方式：

The above formula can be read as Salary equals b zero plus b1 times experience. So what it essentially means is that it is putting a line through the above shown chart that best fits the data. I will explain about the best fitting line as we move ahead when I speak about Ordinary Least Square Method [OLS], but for now as you can see in the below mentioned picture the line that best fits the data.

上面的公式可以理解为薪水等于b零加b1乘以经验。因此，这实际上意味着在上面显示的图表中划一条最适合数据的线。当我谈论普通最小二乘法[OLS]时，我将解释最佳拟合线，但是现在，如下面的图片所示，您可以看到最适合数据的线。

Let us focus on the coefficients b1 and a constant b0.
让我们关注系数b1和常数b0。

The constant b0 is the point or value where the line intersects in the vertical axis i.e y-axis. Suppose let’s say b0 value is $30k, so when experience is 0, the second part of the equation i.e b1*experience becomes zero. That means salary = $30k. According to the model when a fresher joins a company his salary will be $30k.

常数b0是线在垂直轴(即y轴)上相交的点或值。假设b0的值为$ 30k，那么当经验为0时，等式的第二部分，即b1 * experience变为零。这意味着薪水= 3万美元。根据该模型，当新人加入公司时，他的薪水将为3万美元。

Now, What is b1?
现在，b1是什么？

b1 is the slope of the line, more money you get as experience increases more will be the value of b1. As you can see in the above image when you perform the projections as per the black dotted lines, for one year increment in the experience there is a increase of around $10k in salary.

b1是直线的斜率，随着经验的增加，您获得的更多金钱将成为b1的价值。正如您在上图中所看到的，当按照黑色虚线执行投影时，在一年的经验积累中，薪水增加了大约1万美元。

If the coefficient b1 is less, then slope will be less and even the salary increment per year will be less, if the slope is more then the experience will yield more increase in the salary and Yes, that’s how a Simple Linear Regression works.

如果系数b1较小，则斜率将较小，甚至每年的薪金增量也将较小；如果斜率较大，则经验将使工资增加更多，是的，这就是简单线性回归的工作原理。

如何找出简单线性回归[SLR]的最佳拟合线？ (How to find out the BEST FIT LINE FOR Simple Linear Regression [SLR]?)

The answer is by Ordinary Least Square[OLS] Method
答案是通过普通最小二乘法[OLS]

Now let’s try to understand how to find out the best fitting line or how SLR finds out that line for us.

现在，让我们尝试了解如何找到最佳拟合线，或者SLR如何为我们找到最佳拟合线。

The above shown graph is the same graph which I explained earlier. We have got the red dots that depicts the actual observation, we also have the straight line that best fits the data. To understand the working of OLS method let’s do some modifications on the graph:

上面显示的图形与我之前解释的图形相同。我们有描述实际观察结果的红点，还有最适合数据的直线。为了了解OLS方法的工作原理，我们对图形进行一些修改：

We draw straight lines which are perpendicular to the observations to the best fitting line and then let’s select one observation as shown below:

我们绘制垂直于观测值的直线到最佳拟合线，然后让我们选择一个观测值，如下所示：

Now you can see from the above picture that the red dot is the salary of a person for a particular year of experience. Let’s assume for 5 years of experience the salary is $50k. The model line, the blue line actually tells us what actually that person should get in terms of salary based on that data in generalized way. Let’s say he should earn $40K for 5 years of experience which is indicated by the green dot on the line.

现在，从上图可以看到，红点是一个人在特定年份的薪水。假设有5年的工作经验，工资是$ 50k。模型行，蓝色行实际上告诉我们，根据该数据，该人员应以概括的方式实际获得的薪水是多少。假设他应该在5年的经验中赚到$ 40K，这由行上的绿点表示。

Next, let’s call the red dot as yi that is the actual observation and green dot is called yi^(also called yi hat) which is the observation/value which the model is trying to predict and the blue dotted line is the difference between what the employee is actually earning and what he/she should be earning according to the model. In general, blue dotted line is the difference between the observed and the modeled.

接下来，我们将红色点称为yi，这是实际的观测值，将绿色点称为yi ^(也称为yi hat)，这是模型试图预测的观测值/值，蓝色虚线是两者之间的差。员工实际赚取的收入以及根据模型应获得的收入。通常，蓝色虚线是观察到的和建模之间的差异。

To get this best fitting line, what is done is that we take the sum of (yi-yi^)², take the value of each one of those dotted blue lines, we square them and then wetake sum of those squares, once we have the sum of those squares we find out the minimum of them.

为了获得最佳拟合线，要做的是我们取(yi-yi ^)²的总和，取每条虚线蓝色线的值，将它们平方，然后取这些平方的和。有那些平方的和，我们找出它们的最小值。

So, what a SLR does is that it draws lots and lots of these lines just like this:

因此，SLR要做的就是绘制很多这样的线条，如下所示：

and then finds a line which has minimum sum of squares of (yi-yi^) and that line is the best fitting line and the method followed to find out this line is called as the Ordinary least square [OLS] method.

然后找到一条具有(yi-yi ^)的最小平方和的线，并且该线是最佳拟合线，并且为了找出该线而遵循的方法称为“普通最小二乘[OLS]”方法。