python多项式回归_如何在Python中实现多项式回归模型

python多项式回归

Let’s start with an example. We want to predict the Price of a home based on the Area and Age. The function below was used to generate Home Prices and we can pretend this is “real-world data” and our “job” is to create a model which will predict the Price based on Area and Age:

让我们从一个例子开始。 我们想根据面积和年龄来预测房屋价格。 下面的函数用于生成房屋价格,我们可以假装这是“真实数据”,而我们的“工作”是创建一个模型,该模型将根据面积和年龄预测价格:

价格= -3 *面积-10 *年龄+ 0.033 *面积²-0.0000571 *面积³+ 500 (Price = -3*Area -10*Age + 0.033*Area² -0.0000571*Area³ + 500)

Image for post
Home Prices vs Area & Age
房屋价格与面积和年龄

线性模型 (Linear Model)

Let’s suppose we just want to create a very simple Linear Regression model that predicts the Price using slope coefficients c1 and c2 and the y-intercept c0:

假设我们只想创建一个非常简单的线性回归模型,该模型使用斜率系数c1和c2以及y轴截距 c0来预测价格:

Price = c1*Area+c2*Age + c0

价格= c1 *面积+ c2 *年龄+ c0

We’ll load the data and implement Scikit-Learn’s Linear Regression. Behind the scenes, model coefficients (c0, c1, c2) are computed by minimizing the sum of squares of individual errors between target variable y and the model prediction:

我们将加载数据并实现Scikit-Learn的线性回归 。 在幕后,通过最小化目标变量y与模型预测之间的各个误差的平方和来计算模型系数(c0,c1,c2):

But you see we don’t do a very good job with this model.

但是您会看到我们在此模型上做得不好。

Image for post
Simple Linear Regression Model (Mean Relative Error: 9.5%)
简单线性回归模型(平均相对误差:9.5%)

多项式回归模型 (Polynomial Regression Model)

Next, let’s implement the Polynomial Regression model because it’s the right tool for the job. Rewriting the initial function used to generate the home Prices, where x1 = Area, and x2 = Age, we get the following:

接下来,让我们实现多项式回归模型,因为它是这项工作的正确工具。 重写用于生成房屋价格的初始函数,其中x1 =面积,x2 =年龄,我们得到以下信息:

价格= -3 * x1 -10 * x2 + 0.033 *x1²-0.0000571 *x1³+ 500 (Price = -3*x1 -10*x2 + 0.033*x1² -0.0000571*x1³ + 500)

So now instead of the Linear model (Price = c1*x1 +c2*x2 + c0), Polynomial Regression requires we transform the variables x1 and x2. For example, if we want to fit a 2nd-degree polynomial, the input variables are transformed as follows:

因此,现在多项式回归代替线性模型(价格= c1 * x1 + c2 * x2 + c0),需要转换变量x1和x2。 例如,如果要拟合二阶多项式,则输入变量的转换如下:

1, x1, x2, x1², x1x2, x2²

1,x1,x2,x1²,x1x2,x2²

But our 3rd-degree polynomial version will be:

但是我们的三阶多项式将是:

1, x1, x2, x1², x1x2, x2², x1³, x1²x2, x1x2², x2³

1,x1,x2,x1²,x1x2,x2²,x1³,x1²x2,x1x2²,x2³

Then we can use the Linear model with the polynomially transformed input features and create a Polynomial Regression model in the form of:

然后,我们可以将线性模型与多项式转换后的输入特征一起使用,并创建以下形式的多项式回归模型:

Price = 0*1 + c1*x1 + c2*x2 +c3*x1² + c4*x1x2 + … + cn*x2³ + c0

价格= 0 * 1 + c1 * x1 + c2 * x2 + c3 *x1²+ c4 * x1x2 +…+ cn *x2³+ c0

(0*1 relates to the bias (1s) column)

(0 * 1与偏置(1s)列有关)

After training the model on the data we can check the coefficients and see if they match our original function used to generate home prices:

在对数据进行模型训练之后,我们可以检查系数,看看它们是否与用于生成房屋价格的原始函数匹配:

Original Function:

原始功能:

价格= -3 * x1 -10 * x2 + 0.033 *x1²-0.0000571 *x1³+ 500 (Price = -3*x1 -10*x2 + 0.033*x1² -0.0000571*x1³ + 500)

Polynomial Regression model coefficients:

多项式回归模型系数:

Image for post
Image for post

and indeed they match!

确实匹配!

Now you can see we do a much better job.

现在您可以看到我们做得更好。

Image for post
Polynomial Regression Model (Mean Relative Error: 0%)
多项式回归模型(平均相对误差:0%)

And there you have it, now you know how to implement a Polynomial Regression model in Python. Entire code can be found here.

有了它,现在您知道如何在Python中实现多项式回归模型。 完整的代码可以在这里找到。

结束语 (Closing remarks)

  • If this were a real-world ML task, we should have split data into training and testing sets, and evaluated the model on the testing set.

    如果这是现实世界中的ML任务,我们应该将数据分为训练和测试集,并在测试集上评估模型。
  • It’s better to use other accuracy metrics such as RMSE because MRE will be undefined if there’s a 0 in the y values.

    最好使用其他精度度量标准,例如RMSE,因为如果y值中为0,则MRE将不确定。

翻译自: https://medium.com/@nikola.kuzmic945/how-to-implement-a-polynomial-regression-model-in-python-6250ce96ba61

python多项式回归

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389367.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

充分利用UC berkeleys数据科学专业

By Kyra Wong and Kendall Kikkawa黄凯拉(Kyra Wong)和菊川健多 ( Kendall Kikkawa) 什么是“数据科学”? (What is ‘Data Science’?) Data collection, an important aspect of “data science”, is not a new idea. Before the tech boom, every industry al…

02-web框架

1 while True:print(server is waiting...)conn, addr server.accept()data conn.recv(1024) print(data:, data)# 1.得到请求的url路径# ------------dict/obj d["path":"/login"]# d.get(”path“)# 按着http请求协议解析数据# 专注于web业…

ai驱动数据安全治理_AI驱动的Web数据收集解决方案的新起点

ai驱动数据安全治理Data gathering consists of many time-consuming and complex activities. These include proxy management, data parsing, infrastructure management, overcoming fingerprinting anti-measures, rendering JavaScript-heavy websites at scale, and muc…

铁拳nat映射_铁拳如何重塑我的数据可视化设计流程

铁拳nat映射It’s been a full year since I’ve become an independent data visualization designer. When I first started, projects that came to me didn’t relate to my interests or skills. Over the past eight months, it’s become very clear to me that when cl…

DengAI —如何应对数据科学竞赛? (EDA)

了解机器学习 (Understanding ML) This article is based on my entry into DengAI competition on the DrivenData platform. I’ve managed to score within 0.2% (14/9069 as on 02 Jun 2020). Some of the ideas presented here are strictly designed for competitions li…

java.net.SocketException: Software caused connection abort: socket write erro

场景:接口测试 编辑器:eclipse 版本:Version: 2018-09 (4.9.0) testng版本:TestNG version 6.14.0 执行testng.xml时报错信息: 出现此报错原因之一:网上有人说是testng版本与eclipse版本不一致造成的&#…

使用K-Means对美因河畔法兰克福的社区进行聚类

介绍 (Introduction) This blog post summarizes the results of the Capstone Project in the IBM Data Science Specialization on Coursera. Within the project, the districts of Frankfurt am Main in Germany shall be clustered according to their venue data using t…

样本均值的抽样分布_抽样分布样本均值

样本均值的抽样分布One of the most important concepts discussed in the context of inferential data analysis is the idea of sampling distributions. Understanding sampling distributions helps us better comprehend and interpret results from our descriptive as …

玩转ceph性能测试---对象存储(一)

笔者最近在工作中需要测试ceph的rgw,于是边测试边学习。首先工具采用的intel的一个开源工具cosbench,这也是业界主流的对象存储测试工具。 1、cosbench的安装,启动下载最新的cosbench包wget https://github.com/intel-cloud/cosbench/release…

因果关系和相关关系 大数据_数据科学中的相关性与因果关系

因果关系和相关关系 大数据Let’s jump into it right away.让我们马上进入。 相关性 (Correlation) Correlation means relationship and association to another variable. For example, a movement in one variable associates with the movement in another variable. For…

vue取数据第一个数据_我作为数据科学家的第一个月

vue取数据第一个数据A lot.很多。 I landed my first job as a Data Scientist at the beginning of August, and like any new job, there’s a lot of information to take in at once.我于8月初找到了数据科学家的第一份工作,并且像任何新工作一样,一…

STL-开篇

基本概念 STL: Standard Template Library,标准模板库 定义: c引入的一个标准类库 特点:1)数据结构和算法的 c实现( 采用模板类和模板函数)2)数据的存储和算法的分离3)高…

rcp rapido_为什么气流非常适合Rapido

rcp rapidoBack in 2019, when we were building our data platform, we started building the data platform with Hadoop 2.8 and Apache Hive, managing our own HDFS. The need for managing workflows whether it’s data pipelines, i.e. ETL’s, machine learning predi…

Mysql5.7开启远程

2019独角兽企业重金招聘Python工程师标准>>> 1.注掉bind-address #bind-address 127.0.0.1 2.开启远程访问权限 grant all privileges on *.* to root"xxx.xxx.xxx.xxx" identified by "密码"; 或 grant all privileges on *.* to root"%…

分类结果可视化python_可视化分类结果的另一种方法

分类结果可视化pythonI love good data visualizations. Back in the days when I did my PhD in particle physics, I was stunned by the histograms my colleagues built and how much information was accumulated in one single plot.我喜欢出色的数据可视化。 早在我获得…

算法组合 优化算法_算法交易简化了风险价值和投资组合优化

算法组合 优化算法Photo by Markus Spiske (left) and Jamie Street (right) on UnsplashMarkus Spiske (左)和Jamie Street(右)在Unsplash上的照片 In the last post, we saw how actual algorithms are developed and tested. In this post, we will figure out the level of…

PS抠发丝技巧 「选择并遮住…」

PS抠发丝技巧 「选择并遮住…」 现在的海报设计,大多数都有模特MM,然而MM的头发实用太多了,有的还飘起来…… 对于设计师(特别是淘宝美工)没有一个强大、快速、实用的抠发丝技巧真的混不去哦。而PS CC 2017版本开始,就有了一个强大…

covid 19如何重塑美国科技公司的工作文化

未来 , 技术 , 观点 (Future, Technology, Opinion) Who would have thought that a single virus would take down the whole world and make us stay inside our homes? A pandemic wave that has altered our lives in such a way that no human (bi…

python生日悖论分析_生日悖论

python生日悖论分析If you have a group of people in a room, how many do you need to for it to be more likely than not, that two or more will have the same birthday?如果您在一个房间里有一群人,那么您需要多少个才能使两个或两个以上的人有相同的生日&a…

rstudio 管道符号_R中的管道指南

rstudio 管道符号R基础知识 (R Fundamentals) Data analysis often involves many steps. A typical journey from raw data to results might involve filtering cases, transforming values, summarising data, and then running a statistical test. But how can we link al…