十一、加权线性回归案例:预测鲍鱼的年龄

加权线性回归案例:预测鲍鱼的年龄

点击文章标题即可获取源代码和笔记
数据集:https://download.csdn.net/download/weixin_44827418/12553408

1.导入数据集

数据集描述:
在这里插入图片描述

import pandas as pd
import numpy as npabalone = pd.read_table("./datas/abalone.txt",header=None)
abalone.columns=['性别','长度','直径','高度','整体重量','肉重量','内脏重量','壳重','年龄']
abalone.head()
性别长度直径高度整体重量肉重量内脏重量壳重年龄
010.4550.3650.0950.51400.22450.10100.15015
110.3500.2650.0900.22550.09950.04850.0707
2-10.5300.4200.1350.67700.25650.14150.2109
310.4400.3650.1250.51600.21550.11400.15510
400.3300.2550.0800.20500.08950.03950.0557
abalone.shape
(4177, 9)
abalone.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4177 entries, 0 to 4176
Data columns (total 9 columns):#   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  0   性别      4177 non-null   int64  1   长度      4177 non-null   float642   直径      4177 non-null   float643   高度      4177 non-null   float644   整体重量    4177 non-null   float645   肉重量     4177 non-null   float646   内脏重量    4177 non-null   float647   壳重      4177 non-null   float648   年龄      4177 non-null   int64  
dtypes: float64(7), int64(2)
memory usage: 293.8 KB
abalone.describe()
性别长度直径高度整体重量肉重量内脏重量壳重年龄
count4177.0000004177.0000004177.0000004177.0000004177.0000004177.0000004177.0000004177.0000004177.000000
mean0.0529090.5239920.4078810.1395160.8287420.3593670.1805940.2388319.933684
std0.8222400.1200930.0992400.0418270.4903890.2219630.1096140.1392033.224169
min-1.0000000.0750000.0550000.0000000.0020000.0010000.0005000.0015001.000000
25%-1.0000000.4500000.3500000.1150000.4415000.1860000.0935000.1300008.000000
50%0.0000000.5450000.4250000.1400000.7995000.3360000.1710000.2340009.000000
75%1.0000000.6150000.4800000.1650001.1530000.5020000.2530000.32900011.000000
max1.0000000.8150000.6500001.1300002.8255001.4880000.7600001.00500029.000000

2. 查看数据分布状况

import numpy as np
import pandas as pd
import random
import matplotlib as mpl
import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif']=['simhei'] #显示中文
plt.rcParams['axes.unicode_minus']=False # 用来正常显示负号  
%matplotlib inline
mpl.cm.rainbow(np.linspace(0,1,10))
array([[5.00000000e-01, 0.00000000e+00, 1.00000000e+00, 1.00000000e+00],[2.80392157e-01, 3.38158275e-01, 9.85162233e-01, 1.00000000e+00],[6.07843137e-02, 6.36474236e-01, 9.41089253e-01, 1.00000000e+00],[1.66666667e-01, 8.66025404e-01, 8.66025404e-01, 1.00000000e+00],[3.86274510e-01, 9.84086337e-01, 7.67362681e-01, 1.00000000e+00],[6.13725490e-01, 9.84086337e-01, 6.41213315e-01, 1.00000000e+00],[8.33333333e-01, 8.66025404e-01, 5.00000000e-01, 1.00000000e+00],[1.00000000e+00, 6.36474236e-01, 3.38158275e-01, 1.00000000e+00],[1.00000000e+00, 3.38158275e-01, 1.71625679e-01, 1.00000000e+00],[1.00000000e+00, 1.22464680e-16, 6.12323400e-17, 1.00000000e+00]])
mpl.cm.rainbow(np.linspace(0,1,10))[0]
array([0.5, 0. , 1. , 1. ])
def dataPlot(dataSet):m,n = dataSet.shapefig = plt.figure(figsize=(8,20),dpi=100)colormap = mpl.cm.rainbow(np.linspace(0,1,n))for i in range(n):fig_ = fig.add_subplot(n,1,i+1)plt.scatter(range(m),dataSet.iloc[:,i].values,s=2,c=colormap[i])plt.title(dataSet.columns[i])plt.tight_layout(pad=1.2) # 调节子图间的距离
# 运行函数,查看数据分布:
dataPlot(abalone)
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-sUDRrFEr-1593153198969)(output_10_1.png)]

可以从数据分布散点图中看出:

1)除“性别”之外,其他数据明显存在规律性排列

2)“高度”这一特征中,有两个异常值

从看到的现象,我们可以采取以下两种措施:

1) 切分训练集和测试集时,需要打乱原始数据集来进行随机挑选

2) 剔除"高度"这一特征中的异常值

abalone['高度']<0.4
0       True
1       True
2       True
3       True
4       True... 
4172    True
4173    True
4174    True
4175    True
4176    True
Name: 高度, Length: 4177, dtype: bool
aba = abalone.loc[abalone['高度']<0.4,:]
#再次查看数据集的分布
dataPlot(aba)
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-rhcXvPsH-1593153198971)(output_18_1.png)]

2. 切分训练集和测试集

"""
函数功能:随机切分训练集和测试集
参数说明:dataSet:原始数据集rate:训练集比例
返回:train,test:切分好的训练集和测试集
"""
def randSplit(dataSet,rate):l = list(dataSet.index) # 将原始数据集的索引提取出来,存到列表中random.seed(123) # 设置随机数种子random.shuffle(l) # 随机打乱数据集中的索引dataSet.index = l # 把打乱后的索引重新赋值给数据集中的索引,# 索引打乱了就相当于打乱了原始数据集中的数据m = dataSet.shape[0] # 原始数据集样本总数n = int(m*rate) # 训练集样本数量train = dataSet.loc[range(n),:] # 从打乱了的原始数据集中提取出训练集数据test = dataSet.loc[range(n,m),:] # 从打乱了的原始数据集中提取出测试集数据train.index = range(train.shape[0]) # 重置train训练数据集中的索引test.index = range(test.shape[0]) # 重置test测试数据集中的索引dataSet.index = range(dataSet.shape[0]) # 重置原始数据集中的索引return train,test
train,test = randSplit(aba,0.8)
#探索训练集
train.head()
性别长度直径高度整体重量肉重量内脏重量壳重年龄
0-10.5900.4700.1700.90000.35500.19050.250011
110.5600.4500.1450.93550.42500.16450.272511
2-10.6350.5350.1901.24200.57600.24750.390014
310.5050.3900.1150.55850.25750.11900.15358
410.5100.4100.1450.79600.38650.18150.19558
train.shape
(3340, 9)
abalone.describe()
性别长度直径高度整体重量肉重量内脏重量壳重年龄
count4177.0000004177.0000004177.0000004177.0000004177.0000004177.0000004177.0000004177.0000004177.000000
mean0.0529090.5239920.4078810.1395160.8287420.3593670.1805940.2388319.933684
std0.8222400.1200930.0992400.0418270.4903890.2219630.1096140.1392033.224169
min-1.0000000.0750000.0550000.0000000.0020000.0010000.0005000.0015001.000000
25%-1.0000000.4500000.3500000.1150000.4415000.1860000.0935000.1300008.000000
50%0.0000000.5450000.4250000.1400000.7995000.3360000.1710000.2340009.000000
75%1.0000000.6150000.4800000.1650001.1530000.5020000.2530000.32900011.000000
max1.0000000.8150000.6500001.1300002.8255001.4880000.7600001.00500029.000000
train.describe() #统计描述
性别长度直径高度整体重量肉重量内脏重量壳重年龄
count3340.0000003340.0000003340.0000003340.0000003340.0000003340.0000003340.0000003340.0000003340.000000
mean0.0604790.5227540.4068860.1387900.8249060.3581510.1797320.2371589.911976
std0.8190210.1203000.0993720.0384410.4885350.2224220.1090360.1379203.223534
min-1.0000000.0750000.0550000.0000000.0020000.0010000.0005000.0015001.000000
25%-1.0000000.4500000.3500000.1150000.4390000.1843750.0920000.1300008.000000
50%0.0000000.5400000.4200000.1400000.7967500.3355000.1710000.2320009.000000
75%1.0000000.6150000.4800000.1650001.1472500.4985000.2505000.32500011.000000
max1.0000000.7800000.6300000.2500002.8255001.4880000.7600001.00500027.000000
dataPlot(train) #查看训练集数据分布
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-sIC8Ac3y-1593153198972)(output_26_1.png)]

#探索测试集
test.head() 
性别长度直径高度整体重量肉重量内脏重量壳重年龄
010.6300.4700.1501.13550.53900.23250.311512
1-10.5850.4450.1400.91300.43050.22050.253010
2-10.3900.2900.1250.30550.12100.08200.09007
310.5250.4100.1300.99000.38650.24300.295015
410.6250.4750.1601.08450.50050.23550.310510
test.shape 
(835, 9)
test.describe() 
性别长度直径高度整体重量肉重量内脏重量壳重年龄
count835.000000835.000000835.000000835.000000835.000000835.000000835.000000835.000000835.000000
mean0.0227540.5288080.4117370.1407840.8427140.3633700.1837490.24532010.022754
std0.8343410.1191660.0986270.0386640.4959900.2189380.1115100.1439253.230284
min-1.0000000.1300000.1000000.0150000.0130000.0045000.0030000.0040003.000000
25%-1.0000000.4500000.3500000.1150000.4580000.1920000.0965000.1327508.000000
50%0.0000000.5500000.4300000.1400000.8100000.3390000.1705000.23500010.000000
75%1.0000000.6200000.4850000.1700001.1772500.5107500.2592500.33700011.000000
max1.0000000.8150000.6500000.2500002.5550001.1455000.5900000.81500029.000000
dataPlot(test)
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-MjIwyXmw-1593153198974)(output_30_1.png)]

3.构建辅助函数

'''
函数功能:输入DF数据集(最后一列为标签),返回特征矩阵和标签矩阵
'''
def get_Mat(dataSet):xMat = np.mat(dataSet.iloc[:,:-1].values)yMat = np.mat(dataSet.iloc[:,-1].values).Treturn xMat,yMat
'''
函数功能:数据集可视化
'''
def plotShow(dataSet):xMat,yMat = get_Mat(dataSet)plt.scatter(xMat.A[:,1],yMat.A,c='b',s=5)plt.show()
'''
函数功能:计算回归系数
参数说明:dataSet:原始数据集
返回:ws:回归系数
'''
def standRegres(dataSet):xMat,yMat = get_Mat(dataSet)xTx = xMat.T * xMatif np.linalg.det(xTx) == 0:print('矩阵为奇异矩阵,无法求逆!')returnws = xTx.I*(xMat.T*yMat) # xTx.I ,用来求逆矩阵return ws
"""
函数功能:计算误差平方和SSE
参数说明:dataSet:真实值regres:求回归系数的函数
返回:SSE:误差平方和
"""
def sseCal(dataSet, regres):xMat,yMat = get_Mat(dataSet)ws = regres(dataSet)yHat = xMat*wssse = ((yMat.A.flatten() - yHat.A.flatten())**2).sum()#  return sse

以ex0数据集为例,查看函数运行结果:

ex0 = pd.read_table("./datas/ex0.txt",header=None)
ex0.head()
012
01.00.0677323.176513
11.00.4278103.816464
21.00.9957314.550095
31.00.7383364.256571
41.00.9810834.560815
#简单线性回归的SSE
sseCal(ex0, standRegres)
1.3552490816814902

构建相关系数R2计算函数

"""
函数功能:计算相关系数R2
"""
def rSquare(dataSet,regres):xMat,yMat=get_Mat(dataSet)sse = sseCal(dataSet,regres)sst = ((yMat.A-yMat.mean())**2).sum()#  r2 = 1 - sse / sstreturn r2

同样以ex0数据集为例,查看函数运行结果:

#简单线性回归的R2
rSquare(ex0, standRegres)
0.9731300889856916
'''
函数功能:计算局部加权线性回归的预测值
参数说明:testMat:测试集xMat:训练集的特征矩阵yMat:训练集的标签矩阵返回:yHat:函数预测值
'''
def LWLR(testMat,xMat,yMat,k=1.0):n = testMat.shape[0] # 测试数据集行数m = xMat.shape[0] # 训练集特征矩阵行数weights = np.mat(np.eye(m)) # 用单位矩阵来初始化权重矩阵,yHat = np.zeros(n) # 用0矩阵来初始化预测值矩阵for i in range(n):for j in range(m):diffMat = testMat[i] - xMat[j]weights[j,j] = np.exp(diffMat*diffMat.T / (-2*k**2))xTx = xMat.T*(weights*xMat)if np.linalg.det(xTx) == 0:print('矩阵为奇异矩阵,无法求逆')returnws = xTx.I*(xMat.T*(weights*yMat))yHat[i] = testMat[i] * wsreturn ws,yHat

4.构建加权线性模型

因为数据量太大,计算速度极慢,所以此处选择训练集的前100个数据作为训练集,测试集的前100个数据作为测试集。

"""
函数功能:绘制不同k取值下,训练集和测试集的SSE曲线
"""
def ssePlot(train,test):X0,Y0 = get_Mat(train)X1,Y1 =get_Mat(test)train_sse = []test_sse = []for k in np.arange(0.2,10,0.5):ws1,yHat1 = LWLR(X0[:99],X0[:99],Y0[:99],k) sse1 = ((Y0[:99].A.T - yHat1)**2).sum() train_sse.append(sse1)ws2,yHat2 = LWLR(X1[:99],X0[:99],Y0[:99],k) sse2 = ((Y1[:99].A.T - yHat2)**2).sum() test_sse.append(sse2)plt.figure(figsize=(20,8),dpi=100)plt.plot(np.arange(0.2,10,0.5),train_sse,color='b')#     plt.plot(np.arange(0.2,10,0.5),test_sse,color='r') plt.xlabel('不同k取值')plt.ylabel('SSE')plt.legend(['train_sse','test_sse'])

运行结果:

ssePlot(train,test)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-BXGhyRcs-1593153198975)(output_47_0.png)]

这个图的解读应该是这样的:从右往左看,当K取较大值时,模型比较稳定,随着K值的减小,训练集的SSE开始逐渐减小,当K取到2左右,训练集的SSE与测试集的SSE相等,当K继续减小时,训练集的SSE也越来越小,也就是说,模型在训练集上的表现越来越好,但是,模型在测试集上的表现却越来越差了,这就说明模型开始出现过拟合了。其实,这个图与前面不同k值的结果图是吻合的,K=1.0,
0.01, 0.003这三张图也表明随着K的减小,模型会逐渐出现过拟合。所以这里可以看出,K在2左右的取值最佳。

我们再将K=2带入局部线性回归模型中,然后查看预测结果:

train,test = randSplit(aba,0.8) # 随机切分原始数据集,得到训练集和测试集
trainX,trainY = get_Mat(train) # 将切分好的训练集分成特征矩阵和标签矩阵
testX,testY = get_Mat(test) # 将切分好的测试集分成特征矩阵和标签矩阵
ws0,yHat0 = LWLR(testX,trainX,trainY,k=2)

绘制真实值与预测值之间的关系图

y=testY.A.flatten()
plt.scatter(y,yHat0,c='b',s=5); # ;等效于plt.show()

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-y9Wfstwl-1593153198976)(output_52_0.png)]

通过上图可知,横坐标为真实值,纵坐标为预测值,形成的图像为呈现一个“喇叭形”,随着横坐标真实值逐渐变大,纵坐标预测值也越来越大,说明随着真实值的增加,预测值偏差越来越大

封装一个函数来计算SSE和R方,方便后续调用

"""
函数功能:计算加权线性回归的SSE和R方
"""
def LWLR_pre(dataSet):train,test = randSplit(dataSet,0.8)#      trainX,trainY = get_Mat(train)testX,testY = get_Mat(test)ws,yHat = LWLR(testX,trainX,trainY,k=2)#     sse = ((testY.A.T - yHat)**2).sum()#     sst = ((testY.A-testY.mean())**2).sum() #     r2 = 1 - sse / sstreturn sse,r2

查看模型预测结果

LWLR_pre(aba)
(4152.777097646255, 0.5228101340130846)

从结果可以看出,SSE达4000+,相关系数只有0.52,模型效果并不是很好。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/471037.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

LeetCode 2260. 必须拿起的最小连续卡牌数(哈希)

文章目录1. 题目2. 解题1. 题目 给你一个整数数组 cards &#xff0c;其中 cards[i] 表示第 i 张卡牌的 值 。如果两张卡牌的值相同&#xff0c;则认为这一对卡牌 匹配 。 返回你必须拿起的最小连续卡牌数&#xff0c;以使在拿起的卡牌中有一对匹配的卡牌。 如果无法得到一对…

十二、案例:加利福尼亚房屋价值数据集(多元线性回归) Lasso 岭回归 分箱处理非线性问题 多项式回归

案例&#xff1a;加利福尼亚房屋价值数据集&#xff08;线性回归&#xff09;& Lasso & 岭回归 & 分箱处理非线性问题 点击标题即可获取文章源代码和笔记 1. 导入需要的模块和库 from sklearn.linear_model import LinearRegression as LR from sklearn.model_sel…

LeetCode 2261. 含最多 K 个可整除元素的子数组

文章目录1. 题目2. 解题1. 题目 给你一个整数数组 nums 和两个整数 k 和 p &#xff0c;找出并返回满足要求的不同的子数组数&#xff0c;要求子数组中最多 k 个可被 p 整除的元素。 如果满足下述条件之一&#xff0c;则认为数组 nums1 和 nums2 是 不同 数组&#xff1a; 两…

二十、MySQL之用户权限管理(用户管理、权限管理、忘记root密码的解决方案)

用户权限管理&#xff1a;在不同的项目中给不同的角色&#xff08;开发者&#xff09;不同的操作权限&#xff0c;为了保证数据库数据的安全。 通常&#xff0c;一个用户的密码不会长期不变&#xff0c;所以需要经常性的变更数据库用户密码来确保用户本身安全&#xff08;mysql…

PyQt5 基本窗口控件(状态栏/窗口/图标/提示消息/QLabel/文本类控件)

文章目录1. 状态栏2. 窗口居中显示3. 关闭窗口4. QWidget5. 添加图标6. 气泡提示信息7. QLabel添加快捷键8. QLineEditechoMode验证器inputMask综合练习9. QTextEditlearn from 《PyQt5 快速开发与实战》 1. 状态栏 self.statusbar.showMessage("hello, Michael", …

CSMA/CD协议(先听再说,边听边说)

一、概念 载波监听多点接入/碰撞检测 CSMA/CD &#xff08;carrier sense multiple access with colision detection&#xff09; CS&#xff1a;载波侦听/监听&#xff0c;每一个站再发送数据之前以及发送数据时都要检测一下总线上是否有其他计算机再发送数据。 MA&#xff…

PyQt5 基本窗口控件(按钮类/对话框类)

文章目录1. 按钮类1.1 QPushButton1.2 QRadioButton1.3 QCheckBox1.4 QComboBox 下拉列表1.5 QSpinBox 计数器1.6 QSlider 滑动条2. 对话框类2.1 QDialog2.2 QMessageBox2.3 QInputDialog2.4 QFontDialog2.5 QFileDialoglearn from 《PyQt5 快速开发与实战》 https://doc.qt.io…

python网络爬虫系列(二)——ProxyHandler处理器实现代理IP

ProxyHandler处理器&#xff08;代理&#xff09;&#xff1a; 很多网站会检测某一段时间某个IP的访问次数&#xff08;通过流量统计&#xff0c;系统日志等&#xff09;&#xff0c;如果访问次数多的不像正常人&#xff0c;它会禁止这个lP的访问。 所以我们可以设置一些代理服…

LeetCode 2264. 字符串中最大的 3 位相同数字

文章目录1. 题目2. 解题1. 题目 给你一个字符串 num &#xff0c;表示一个大整数。如果一个整数满足下述所有条件&#xff0c;则认为该整数是一个 优质整数 &#xff1a; 该整数是 num 的一个长度为 3 的 子字符串 。该整数由唯一一个数字重复 3 次组成。 以字符串形式返回 …

四则运算个人项目进展

一、项目要求 基本要求&#xff1a;将10-20道四则运算题目写入文档&#xff0c;程序读取并输出题目&#xff0c;同时计算出正确结果。使用者对每道题目计算答案&#xff0c;答对进行提示&#xff0c;答错输出正确结果。分别记录回答正确、错误的数目并输出。四则运算题目基本要…

python网络爬虫系列(一)——urllib库(urlopen、urlretrieve、urlencode、parse-qs、urlparse和urlsplit、request.Request类)

urllib库 urllib库是Python中一个最基本的网络请求库。可以模拟浏览器的行为&#xff0c;向指定的服务器发送一个请求&#xff0c;并可以保存服务器返回的数据。 一、urlopen函数&#xff1a; 在Python3的urllib库中&#xff0c;所有和网络请求相关的方法&#xff0c;都被集…

LeetCode 2265. 统计值等于子树平均值的节点数(DFS)

文章目录1. 题目2. 解题1. 题目 给你一棵二叉树的根节点 root &#xff0c;找出并返回满足要求的节点数&#xff0c;要求节点的值等于其 子树 中值的 平均值 。 注意&#xff1a; n 个元素的平均值可以由 n 个元素 求和 然后再除以 n &#xff0c;并 向下舍入 到最近的整数。…

LeetCode 2267. 检查是否有合法括号字符串路径(BFS)

文章目录1. 题目2. 解题1. 题目 一个括号字符串是一个 非空 且只包含 ( 和 ) 的字符串。 如果下面 任意 条件为 真 &#xff0c;那么这个括号字符串就是 合法的 。 字符串是 () 。字符串可以表示为 AB&#xff08;A 连接 B&#xff09;&#xff0c;A 和 B 都是合法括号序列。…

python网络爬虫系列(三)——cookie的原理、保存与加载

一、什么是cookie&#xff1f; 在网站中,http请求是无状态的.也就是说即使第一次和服务器连接后并且登录成功后,第二次请求服务器依然不能知道当前请求是哪个用户。 cookie的出现就是为了解决这个问题,第一次登录后服务器返回一些数据(cookie)给浏览器,然后浏览器保存在本地,当…

LeetCode 2266. 统计打字方案数(动态规划)

文章目录1. 题目2. 解题1. 题目 Alice 在给 Bob 用手机打字。数字到字母的 对应 如下图所示。 为了 打出 一个字母&#xff0c;Alice 需要 按 对应字母 i 次&#xff0c;i 是该字母在这个按键上所处的位置。 比方说&#xff0c;为了按出字母 s &#xff0c;Alice 需要按 7 四…

大学毕业4年-回顾和总结(2)-钱,收入和支出

过年回家&#xff0c;长辈最喜欢问两件事。第一件事&#xff0c;谈朋友没有啊。第二件事&#xff0c;现在一个月搞多少钱。 如果你和他们说&#xff0c;一个月工资是1万&#xff0c;那么他们立刻认为&#xff0c;你现在手上应该有十多万了。 上班族听了&#xff0c;心里…

python网络爬虫系列(四)——requests模块

requests模块 知识点&#xff1a; 掌握 headers参数的使用掌握 发送带参数的请求掌握 headers中携带cookie掌握 cookies参数的使用掌握 cookieJar的转换方法掌握 超时参数timeout的使用掌握 代理ip参数proxies的使用掌握 使用verify参数忽略CA证书掌握 requests模块发送post请…

PyQt5 基本窗口控件(绘图类 / 拖拽 / 剪贴板 / 日历时间 / 菜单、工具、状态栏 / 打印)

文章目录1. 窗口绘图类1.1 QPainter绘制文字绘制点1.2 QPen1.3 QBrush1.4 QPixmap2. 拖拽与剪贴板2.1 拖拽2.2 剪贴板 QClipboard3. 日历与时间3.1 QCalendar3.2 QDateTimeEdit4. 菜单栏、工具栏、状态栏4.1 菜单栏 QMenuBar4.2 工具栏 QToolBar4.3 状态栏 QStatusBar5. QPrint…

python网络爬虫系列(0)——爬虫概述 http协议复习

一、爬虫概述 知识点&#xff1a; 了解 爬虫的概念 了解 爬虫的作用 了解 爬虫的分类 掌握 爬虫的流程 1. 爬虫的概念 模拟浏览器&#xff0c;发送请求&#xff0c;获取响应 网络爬虫&#xff08;又被称为网页蜘蛛&#xff0c;网络机器人&#xff09;就是模拟客户端(主要指…

使用TFHpple解析html

https://github.com/topfunky/hpple 前期准备工作 引入静态库文件 添加库文件的 header search paths(注意,必须选中 All) 将从github上下载的源码包拖入工程当中 准备工作结束 使用详情 我们来解析网址 http://www.cnblogs.com/YouXianMing/ 中的title标签哦. 思路是这样子的:…