python多项式回归_在python中实现多项式回归

python多项式回归

Video Link

影片连结

You can view the code used in this Episode here: SampleCode

您可以在此处查看 此剧 集中使用的代码: SampleCode

导入我们的数据 (Importing our Data)

The first step is to import our data into python.

第一步是将我们的数据导入python。

We can do that by going on the following link: Data

我们可以通过以下链接来做到这一点: 数据

Click on “code” and download ZIP.

单击“代码”并下载ZIP。

Locate WeatherDataP.csv and copy it into your local disc under a new file called ProjectData

找到WeatherDataP.csv并将其复制到本地磁盘下名为ProjectData的新文件下

Note: WeatherData.csv and WeahterDataM.csv were used in Simple Linear Regression and Multiple Linear Regression.

注意:WeatherData.csv和WeahterDataM.csv用于简单线性回归和多重线性回归 。

Now we are ready to import our data into our Notebook:

现在我们准备将数据导入到笔记本中:

How to set up a new Notebook can be found at the start of Episode 4.3

如何设置新笔记本可以在第4.3节开始时找到

Note: Keep this medium post on a split screen so you can read and implement the code yourself.

注意:请将此帖子张贴在分屏上,以便您自己阅读和实现代码。

# Import Pandas Library, used for data manipulation
# Import matplotlib, used to plot our data
# Import numpy for linear algebra operationsimport pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# Import our WeatherDataP.csv and store it in the variable rweather_data_pweather_data_p = pd.read_csv("D:\ProjectData\WeatherDataP.csv")
# Display the data in the notebookweather_data_p
Image for post

绘制数据 (Plotting our Data)

In order to check what kind of relationship Pressure forms with Humidity, we plot our two variables.

为了检查压力与湿度之间的关系,我们绘制了两个变量。

# Set our input x to Pressure, use [[]] to convert to 2D array suitable for model inputX = weather_data_p[["Pressure (millibars)"]]
y = weather_data_p.Humidity
# Produce a scatter graph of Humidity against Pressureplt.scatter(X, y, c = "black")
plt.xlabel("Pressure (millibars)")
plt.ylabel("Humidity")
Image for post

Here we see Humidity vs Pressure forms a bowl shaped relationship, reminding us of the function: y = 𝑥² .

在这里,我们看到湿度与压力之间呈碗形关系,使我们想起了函数y =𝑥²。

预处理我们的数据 (Preprocessing our Data)

This is the additional step we apply to polynomial regression, where we add the feature 𝑥² to our Model.

这是我们应用于多项式回归的附加步骤 ,在此步骤中将特征𝑥²添加到模型中。

# Import the function "PolynomialFeatures" from sklearn, to preprocess our data
# Import LinearRegression model from sklearnfrom sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
# Set PolynomialFeatures to degree 2 and store in the variable pre_process
# Degree 2 preprocesses x to 1, x and x^2
# Degree 3 preprocesses x to 1, x, x^2 and x^3
# and so on..pre_process = PolynomialFeatures(degree=2)# Transform our x input to 1, x and x^2X_poly = pre_process.fit_transform(X)# Show the transformation on the notebookX_poly
Image for post

e+.. refers to the position of the decimal place.

e + ..指小数位的位置。

E.g1.0e+00 = 1.0 [ keep the decimal point where it is ]1.0144e+03 = 1014.4 [ Move the decimal place 3 places to the right ]1.02900736e+06 = 1029007.36 [ Move the decimal place 6 to places to the right ]

例如 1.0e + 00 = 1.0 [保留小数点处的位置] 1.0144e + 03 = 1014.4 [将小数位右移3位] 1.02900736e + 06 = 1029007.36 [将小数点6向右移动]

— — — — — — — — — — — — — — — — — —

— — — — — — — — — — — — — — — — — — — —

The code above makes the following Conversion:

上面的代码进行了以下转换:

Image for post

Notice that there is a hidden column of 1’s which can be thought of as the variable associated with θ₀. Since θ₀ × 1 = θ₀ this is often left out.

请注意,有一个隐藏的1列,可以将其视为与θ₀相关的变量。 由于θ₀×1 =θ₀,因此经常被忽略。

— — — — — — — — — — — — — — — — — —

— — — — — — — — — — — — — — — — — — — —

实现多项式回归 (Implementing Polynomial Regression)

The method here remains the same as multiple linear regression in python, but here we are fitting our regression model to the preprocessed data:

此处的方法与python中的多元线性回归相同,但此处我们将回归模型拟合为预处理的数据:

pr_model = LinearRegression()# Fit our preprocessed data to the polynomial regression modelpr_model.fit(X_poly, y)# Store our predicted Humidity values in the variable y_newy_pred = pr_model.predict(X_poly)# Plot our model on our dataplt.scatter(X, y, c = "black")
plt.xlabel("Pressure (millibars)")
plt.ylabel("Humidity")
plt.plot(X, y_pred)
Image for post

We can extract θ₀, θ₁ and θ₂ using the following code:

我们可以使用以下代码提取θ₀,θ₁和θ2

theta0 = pr_model.intercept_
_, theta1, theta2 = pr_model.coef_
theta0, theta1, theta2

A “_” is used to ignore the first value in pr_model.coef as this is given by default as 0. The other two co-efficients are labelled theta1 and theta 2 respectively.

_”用于忽略pr_model.coef中的第一个值,因为默认情况下该值为0。其他两个系数分别标记为theta1和theta 2。

Image for post

Giving our polynomial regression model roughly as:

大致给出我们的多项式回归模型:

Image for post

使用我们的回归模型进行预测 (Using our Regression Model to make predictions)

# Predict humidity for a pressure of 1007 millibars
# Tranform 1007 to 1, 1007, 1007^2 suitable for input, using
# pre_process.fit_transformy_new = pr_model.predict(pre_process.fit_transform([[1007]]))
y_new
Image for post

Here we expect a Humidity value of 0.7164631 for a pressure reading of 1007 millibars.

在这里,对于1007毫巴的压力读数,我们期望的湿度值为0.7164631。

We can plot this point on our data plot using the following code:

我们可以使用以下代码在数据图上绘制该点:

plt.scatter(1007, y_new, c = "red")
Image for post

评估我们的模型 (Evaluating our Model)

To evaluate our model we are going to be using mean squared error (MSE), discussed in the previous episode, the function can easily be imported from sklearn.

为了评估我们的模型,我们将使用上一集中讨论的均方误差(MSE) ,可以轻松地从sklearn导入函数。

from sklearn.metrics import mean_squared_error
mean_squared_error(y, y_pred)
Image for post

The mean squared error for our regression model is given by: 0.003358..

我们的回归模型的均方误差为:0.003358 ..

Image for post

If we want to change our model to include 𝑥³ we can do so by simply changing PolynomialFeatures to degree 3:

如果要更改模型以包括𝑥³ ,可以通过将PolynomialFeatures更改为3级来实现

pre_process = PolynomialFeatures(degree=3)

Let’s check if this has decreased our mean squared error:

让我们检查一下这是否降低了均方误差:

Image for post

Indeed it has.

确实有。

You can change the degree used in PolynomialFeatures to anything you like and see for yourself what effect this has on our MSE.

您可以将PolynomialFeatures中使用的度数更改为您喜欢的任何值,并亲自查看这对我们的MSE有什么影响。

Ideally we want to choose the model that:

理想情况下,我们要选择以下模型:

  • Has the lowest MSE

    MSE最低

  • Does not over-fit our data

    不会过度拟合我们的数据

It is important that we plot our model on our data to ensure we don’t end up with the model shown at the end of Episode 4.6, which had an extremely low MSE but over-fitted our data.

重要的是, 我们需要在数据上绘制模型,以确保最终不会出现第4.6集末显示的模型,该模型的MSE极低,但数据过拟合。

上一集 - 下一集 (Prev Episode — Next Episode)

如有任何疑问,请留在下面! (If you have any questions please leave them below!)

Image for post

翻译自: https://medium.com/ai-in-plain-english/implementing-polynomial-regression-in-python-d9aedf520d56

python多项式回归

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389738.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Uboot 命令是如何被使用的?

有什么问题请 发邮件至syyxyoutlook.com, 欢迎交流~ 在uboot代码中命令的模式是这个样子: 这样是如何和命令行交互的呢? 在command.h 中, 我们可以看到如下宏定义 将其拆分出来: #define U_BOOT_CMD(name,maxargs,rep,cmd,usage,help) \ U_…

大数据可视化应用_在数据可视化中应用种族平等意识

大数据可视化应用The following post is a summarized version of the article accepted to the 2020 Visualization for Communication workshop as part of the 2020 IEEE VIS conference to be held in October 2020. The full paper has been published as an OSF Preprint…

Windows10电脑系统时间校准

有时候新安装电脑系统,系统时间不对,需要主动去校准系统时间。1、点击时间 2、日期和时间设置 3、其他日期、时间和区域设置 4、设置时间和日期 5、Internet 时间 6、点击立即更新,如果更新失败就查电脑是否已联网,重试点击立即更…

pd种知道每个数据的类型_每个数据科学家都应该知道的5个概念

pd种知道每个数据的类型意见 (Opinion) 目录 (Table of Contents) Introduction 介绍 Multicollinearity 多重共线性 One-Hot Encoding 一站式编码 Sampling 采样 Error Metrics 错误指标 Storytelling 评书 Summary 摘要 介绍 (Introduction) I have written about common ski…

xgboost keras_用catboost lgbm xgboost和keras预测财务交易

xgboost kerasThe goal of this challenge is to predict whether a customer will make a transaction (“target” 1) or not (“target” 0). For that, we get a data set of 200 incognito variables and our submission is judged based on the Area Under Receiver Op…

2017. 网格游戏

2017. 网格游戏 给你一个下标从 0 开始的二维数组 grid ,数组大小为 2 x n ,其中 grid[r][c] 表示矩阵中 (r, c) 位置上的点数。现在有两个机器人正在矩阵上参与一场游戏。 两个机器人初始位置都是 (0, 0) ,目标位置是 (1, n-1) 。每个机器…

HUST软工1506班第2周作业成绩公布

说明 本次公布的成绩对应的作业为: 第2周个人作业:WordCount编码和测试 如果同学对作业成绩存在异议,在成绩公布的72小时内(截止日期4月26日0点)可以进行申诉,方式如下: 毕博平台的第二周在线答…

币氪共识指数排行榜0910

币氪量化数据在今天的报告中给出DASH的近期买卖信号,可以看出从今年4月中旬起到目前为止,DASH_USDT的价格总体呈现出下降的趋势。 转载于:https://www.cnblogs.com/tokpick/p/9621821.html

走出囚徒困境的方法_囚徒困境的一种计算方法

走出囚徒困境的方法You and your friend have committed a murder. A few days later, the cops pick the two of you up and put you in two separate interrogation rooms such that you have no communication with each other. You think your life is over, but the polic…

Zookeeper系列四:Zookeeper实现分布式锁、Zookeeper实现配置中心

一、Zookeeper实现分布式锁 分布式锁主要用于在分布式环境中保证数据的一致性。 包括跨进程、跨机器、跨网络导致共享资源不一致的问题。 1. 分布式锁的实现思路 说明: 这种实现会有一个缺点,即当有很多进程在等待锁的时候,在释放锁的时候会有…

resize 按钮不会被伪元素遮盖

textarea默认有个resize样式,效果就是下面这样 读 《css 揭秘》时发现两个亮点: 其实这个属性不仅适用于 textarea 元素,适用于下面所有元素:elements with overflow other than visible, and optionally replaced elements repre…

平台api对数据收集的影响_收集您的数据不是那么怪异的api

平台api对数据收集的影响A data analytics cycle starts with gathering and extraction. I hope my previous blog gave an idea about how data from common file formats are gathered using python. In this blog, I’ll focus on extracting the data from files that are…

前端技术周刊 2018-09-10:Redux Mobx

前端快爆 在 Chrome 10 周年之际,正式发布 69 版本,整体 UI 重新设计,同时iOS 版本重新将工具栏放置在了底部。API 层面,支持了 CSS Scroll Snap、前端资源锁 Web Lock API、WebWorker 里面可以跑的 OffscreenCanvas API、toggleA…

逻辑回归 概率回归_概率规划的多逻辑回归

逻辑回归 概率回归There is an interesting dichotomy in the world of data science between machine learning practitioners (increasingly synonymous with deep learning practitioners), and classical statisticians (both Frequentists and Bayesians). There is gener…

sys.modules[__name__]的一个实例

关于sys.modules[__name__]的用法,百度上阅读量比较多得一个帖子是:https://www.cnblogs.com/robinunix/p/8523601.html 对于里面提到的基础性的知识点这里就不再重复了,大家看原贴就好。这里为大家提供一个详细的例子,帮助大家更…

ajax不利于seo_利于探索移动选项的界面

ajax不利于seoLately, my parents will often bring up in conversation their desire to move away from their California home and find a new place to settle down for retirement. Typically they will cite factors that they perceive as having altered the essence o…

C#调用WebKit内核

原文:C#调用WebKit内核版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/u013564470/article/details/80255954 系统要求 Windows与.NET框架 由于WebKit库和.NET框架的要求,WebKit .NET只能在Windows系统上运行。从…

数据分析入门:如何训练数据分析思维?

本文由 网易云 发布。 作者:吴彬彬(本篇文章仅限知乎内部分享,如需转载,请取得作者同意授权。) 我们在生活中,会经常听说两种推理模式,一种是归纳 一种是演绎,这两种思维模式能够帮…

559. N 叉树的最大深度

559. N 叉树的最大深度 给定一个 N 叉树,找到其最大深度。 最大深度是指从根节点到最远叶子节点的最长路径上的节点总数。 N 叉树输入按层序遍历序列化表示,每组子节点由空值分隔(请参见示例)。 示例 1: 输入&#…

el表达式取值优先级

不同容器中存在同名值时,从作用范围小到大的顺序依次尝试取值:pageContext->request->session->application 转载于:https://www.cnblogs.com/wrencai/p/9006880.html