多重线性回归 多元线性回归_了解多元线性回归

多重线性回归 多元线性回归

Video Link

影片连结

We have taken a look at Simple Linear Regression in Episode 4.1 where we had one variable x to predict y, but what if now we have multiple variables, not just x, but x1,x2, x3 … to predict y — how would we approach this problem? I hope to explain in this article.

我们看了第4.1集中的简单线性回归,其中我们有一个变量x来预测y ,但是如果现在我们有多个变量,不仅是x,而且还有x1,x2,x3 …来预测y ,我们将如何处理?这个问题? 我希望在本文中进行解释。

简单线性回归回顾 (Simple Linear Regression Recap)

From Episode 4.1 we had our data of temperature and humidity:

从第4.1集开始,我们获得了温度和湿度数据:

Image for post

We plotted our Data, found and found a linear relationship — making linear regression suitable:

我们绘制了数据,发现并找到了线性关系,从而使线性回归适用:

Image for post

We then calculated our regression line:

然后,我们计算了回归线:

Image for post

using gradient descent to find our parameters θ₀ and θ₁.

使用梯度下降找到我们的参数 θ₀和θ₁。

Image for post

We then used the regression line calculated to make predictions for Humidity given any Temperature value.

然后,我们使用计算得出的回归线对给定任何温度值的湿度进行预测。

什么是多元线性回归? (What is Multiple Linear Regression?)

Multiple linear regression takes the exact same concept as simple linear regression but applies it to multiple variables. So instead of just looking at temperature to predict humidity, we can look at other factors such as wind speed or pressure.

多元线性回归采用与简单线性回归完全相同的概念,但将其应用于多个变量。 因此,我们不仅可以查看温度来预测湿度,还可以查看其他因素,例如风速或压力

Image for post

We are still trying to predict Humidity so this remains as y.

我们仍在尝试预测湿度,因此仍为y。

We rename Temperature, Wind Speed and Pressure to 𝑥¹,𝑥² and 𝑥³.

我们将温度,风速和压力重命名为𝑥¹𝑥²𝑥³。

Just as with Simple Linear Regression we must ensure that our variables 𝑥₁,𝑥₂ and 𝑥₃ form a linear relationship with y, if not we will be producing a very inaccurate model.

就像简单线性回归一样,我们必须确保变量𝑥₁,𝑥_2𝑥₃ 与y形成线性关系 ,否则,我们将生成一个非常不准确的模型。

Lets plot each of our variables against Humidity:

让我们针对湿度绘制每个变量:

Image for post
Image for post
Image for post
  • Temperature and Humidity form a strong linear relationship

    温度和湿度形成很强的线性关系

  • Wind Speed and Humidity form a linear relationship

    风速和湿度形成线性关系

  • Pressure and Humidity do not form a linear relationship

    压力和湿度不是线性关系

We therefore can not use Pressure (𝑥³) in our multiple linear regression model.

因此,我们不能在多元线性回归模型中使用压力 (𝑥³)。

绘制数据 (Plotting our Data)

Let’s now plot both Temperature (𝑥¹) and Wind Speed (𝑥²) against Humidity.

现在让我们绘制两个温度(𝑥¹) 以及相对于湿度的风速(𝑥²)。

Image for post

We can see that our data follows a roughly linear relationship, that is we can fit a plane on our data that captures the relationship between Temperature, Wind-speed(𝑥₁, 𝑥₂) and Humidity (y).

我们可以看到我们的数据遵循大致线性关系,也就是说,我们可以在数据上拟合一个平面 ,以捕获温度,风速(𝑥₁,𝑥²)和湿度(y)之间的关系。

Image for post

计算回归模型 (Calculating the Regression Model)

Because we are dealing with more than one 𝑥 variable our linear regression model takes the form:

因为我们要处理多个𝑥变量,所以线性回归模型采用以下形式:

Image for post

Just as with simple linear regression in order to find our parameters θ₀, θ₁ and θ₂ we need to minimise our cost function:

与简单的线性回归一样,为了找到我们的参数θ₀,θ₁和θ2,我们需要最小化成本函数:

Image for post

We do this using the gradient descent algorithm:

我们使用梯度下降算法执行此操作:

Image for post

This algorithm is explained in more detail here

此算法在这里更详细地说明

After running our gradient descent algorithm we find our optimal parameters to be θ₀ = 1.14 , θ₁ = -0.031 and θ₂ =-0.004

运行梯度下降算法后,我们发现最优参数为θ₀= 1.14,θ₁= -0.031和θ2= -0.004

Giving our final regression model:

给出我们的最终回归模型:

Image for post

We can then use this regression model to make predictions for Humidity (ŷ) given any Temperature (𝑥¹) or Wind speed value(𝑥²).

然后,我们可以使用该回归模型对给定温度(𝑥¹)或风速值(𝑥²)的湿度(ŷ)进行预测。

In general models that contain more variables tend to be more accurate since we are incorporating more factors that have an effect on Humidity.

通常,包含更多变量的模型往往更准确,因为我们纳入了更多会影响湿度的因素。

_________________________________________

_________________________________________

潜在问题 (Potential Problems)

When including more and more variables in our model we run into a few problems:

当在模型中包含越来越多的变量时 ,我们会遇到一些问题:

  • For example certain variables may become redundant. E.g look at our regression line above, θ₂ =0.004, multiplying our wind speed (𝑥²) by 0.004 barely changes our predicted value for humidity ŷ, which makes wind speed less useful to use in our model.

    例如,某些变量可能变得多余。 例如,看一下上面的回归线θ2 = 0.004,将我们的风速()²)乘以0.004几乎不会改变我们对湿度predicted的预测值,这使得风速在模型中的用处不大。
  • Another example is the scale of our data, i.e we can expect temperature to have a range of say -10 to 100, but pressure may have a range of 1000 to 1100. Using different scales of data can heavily affect the accuracy of our model.

    另一个例子是我们的数据规模,即我们可以预期温度范围在-10到100之间,但是压力可能在1000到1100之间。使用不同的数据规模会严重影响我们模型的准确性。

How we solve these issues will be covered in future episodes.

我们如何解决这些问题将在以后的章节中介绍。

上一集 - 下一集 (Prev Episode — Next Episode)

如有任何疑问,请留在下面! (If you have any questions please leave them below!)

Image for post

翻译自: https://medium.com/ai-in-plain-english/understanding-multiple-linear-regression-2672c955ec1c

多重线性回归 多元线性回归

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/391920.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

tp703n怎么做无线打印服务器,TP-Link TL-WR703N无线路由器无线AP模式怎么设置

TP-Link TL-WR703N无线路由器配置简单,不过对于没有网络基础的用户来说,完成路由器的安装和无线AP模式的设置,仍然有一定的困难,本文学习啦小编主要介绍TP-Link TL-WR703N无线路由器无线AP模式的设置方法!TP-Link TL-WR703N无线路…

pandas之groupby分组与pivot_table透视

一、groupby 类似excel的数据透视表,一般是按照行进行分组,使用方法如下。 df.groupby(byNone, axis0, levelNone, as_indexTrue, sortTrue, group_keysTrue,squeezeFalse, observedFalse, **kwargs) 分组得到的直接结果是一个DataFrameGroupBy对象。 df…

js能否打印服务器端文档,js打印远程服务器文件

js打印远程服务器文件 内容精选换一换对于密码鉴权方式创建的Windows 2012弹性云服务器,使用初始密码以MSTSC方式登录时,登录失败,系统显示“第一次登录之前,你必须更改密码。请更新密码,或者与系统管理员或技术支持联…

如何使用Python处理丢失的数据

The complete notebook and required datasets can be found in the git repo here完整的笔记本和所需的数据集可以在git repo中找到 Real-world data often has missing values.实际数据通常缺少值 。 Data can have missing values for a number of reasons such as observ…

为什么印度盛产码农_印度农产品价格的时间序列分析

为什么印度盛产码农Agriculture is at the center of Indian economy and any major change in the sector leads to a multiplier effect on the entire economy. With around 17% contribution to the Gross Domestic Product (GDP), it provides employment to more than 50…

pandas处理excel文件和csv文件

一、csv文件 csv以纯文本形式存储表格数据 pd.read_csv(文件名),可添加参数enginepython,encodinggbk 一般来说,windows系统的默认编码为gbk,可在cmd窗口通过chcp查看活动页代码,936即代表gb2312。 例如我的电脑默认编码时gb2312&…

tukey检测_回到数据分析的未来:Tukey真空度的整洁实现

tukey检测One of John Tukey’s landmark papers, “The Future of Data Analysis”, contains a set of analytical techniques that have gone largely unnoticed, as if they’re hiding in plain sight.John Tukey的标志性论文之一,“ 数据分析的未来 ”&#x…

spring— Spring与Web环境集成

ApplicationContext应用上下文获取方式 应用上下文对象是通过new ClasspathXmlApplicationContext(spring配置文件) 方式获取的,但是每次从容器中获 得Bean时都要编写new ClasspathXmlApplicationContext(spring配置文件) ,这样的弊端是配置文件加载多次…

Elasticsearch集群知识笔记

Elasticsearch集群知识笔记 Elasticsearch内部提供了一个rest接口用于查看集群内部的健康状况: curl -XGET http://localhost:9200/_cluster/healthresponse结果: {"cluster_name": "format-es","status": "green&qu…

matplotlib图表介绍

Matplotlib 是一个python 的绘图库,主要用于生成2D图表。 常用到的是matplotlib中的pyplot,导入方式import matplotlib.pyplot as plt 一、显示图表的模式 1.plt.show() 该方式每次都需要手动show()才能显示图表,由于pycharm不支持魔法函数&a…

到2025年将保持不变的热门流行技术

重点 (Top highlight)I spent a good amount of time interviewing SMEs, data scientists, business analysts, leads & their customers, programmers, data enthusiasts and experts from various domains across the globe to identify & put together a list that…

马尔科夫链蒙特卡洛_蒙特卡洛·马可夫链

马尔科夫链蒙特卡洛A Monte Carlo Markov Chain (MCMC) is a model describing a sequence of possible events where the probability of each event depends only on the state attained in the previous event. MCMC have a wide array of applications, the most common of…

django基于存储在前端的token用户认证

一.前提 首先是这个代码基于前后端分离的API,我们用了django的framework模块,帮助我们快速的编写restful规则的接口 前端token原理: 把(token加密后的字符串,keyname)在登入后发到客户端,以后客户端再发请求,会携带过来服务端截取(token加密后的字符串,keyname),我们再利用解密…

数据分布策略_有效数据项目的三种策略

数据分布策略Many data science projects do not go into production, why is that? There is no doubt in my mind that data science is an efficient tool with impressive performances. However, a successful data project is also about effectiveness: doing the righ…

java基础学习——5、HashMap实现原理

一、HashMap的数据结构 数组的特点是:寻址容易,插入和删除困难;而链表的特点是:寻址困难,插入和删除容易。那么我们能不能综合两者的特性,做出一种寻址容易,插入删除也容易的数据结构&#xff1…

看懂nfl定理需要什么知识_NFL球队为什么不经常通过?

看懂nfl定理需要什么知识Debunking common NFL myths in an analytical study on the true value of passing the ball在关于传球真实价值的分析研究中揭穿NFL常见神话 Background背景 Analytics are not used enough in the NFL. In a league with an abundance of money, i…

29/07/2010 sunrise

** .. We can only appreciate the miracle of a sunrise if we have waited in the darkness .. 人们在黑暗中等待着,那是期盼着如同日出般的神迹出现 .. 附:27/07/2010 sunrise ** --- 31 July 改动转载于:https://www.cnblogs.com/orderedchaos/archi…

密度聚类dbscan_DBSCAN —基于密度的聚类方法的演练

密度聚类dbscanThe idea of having newer algorithms come into the picture doesn’t make the older ones ‘completely redundant’. British statistician, George E. P. Box had once quoted that, “All models are wrong, but some are useful”, meaning that no model…

嵌套路由

父组件不能用精准匹配,否则只组件路由无法展示 转载于:https://www.cnblogs.com/dianzan/p/11308146.html

从完整的新手到通过TensorFlow开发人员证书考试

I recently graduated with a bachelor’s degree in Civil Engineering and was all set to start with a Master’s degree in Transportation Engineering this fall. Unfortunately, my plans got pushed to the winter term because of COVID-19. So as of January this y…