文章目录
- 1. 简单线性回归
- 2. 评价模型
本文为 scikit-learn机器学习(第2版)学习笔记
1. 简单线性回归
import numpy as np
import matplotlib.pyplot as pltX = np.array([[6],[8],[10],[14],[18]])
y = np.array([7,9,13,17.5,18])
plt.title("pizza diameter vs price")
plt.xlabel('diameter')
plt.ylabel('price')
plt.plot(X,y,'r.') # r表示颜色红
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X,y)test_pizza = np.array([[12]])
pred_price = model.predict(test_pizza)
pred_price
# array([13.68103448])
- 误差 ∑(yi−f(xi))2\sum(y_i-f(x_i))^2∑(yi−f(xi))2
print("误差平方和:%.2f" % np.mean((model.predict(X)-y)**2))
误差平方和:1.75
- 方差 var(x)=∑(xi−xˉ)2n−1var(x) = \frac{\sum(x_i-\bar x)^2}{n-1}var(x)=n−1∑(xi−xˉ)2
# 方差
x_bar = X.mean() # 11.2
variance = ((X-x_bar)**2).sum()/(len(X)-1)
variance # 23.2np.var(X, ddof=1) # np内置的方差,ddof为校正选项
###################
ddof : int, optional"Delta Degrees of Freedom": the divisor used in the calculation is``N - ddof``, where ``N`` represents the number of elements. Bydefault `ddof` is zero.
- 协方差 cov(x,y)=∑(xi−xˉ)(yi−yˉ)n−1cov(x,y) = \frac{\sum(x_i-\bar x)(y_i - \bar y)}{n-1}cov(x,y)=n−1∑(xi−xˉ)(yi−yˉ)
# 协方差,两个变量之间的相关性
y_bar = y.mean()
covariance = np.multiply((X-x_bar).transpose(), y-y_bar).sum()/(len(X)-1)
covariance # 22.65np.cov(X.transpose(), y)
array([[23.2 , 22.65],[22.65, 24.3 ]])
假设模型为 y=a+bxy = a+bxy=a+bx
b=cov(x,y)var(x)=22.65/23.2=0.98b = \frac{cov(x,y)}{var(x)} = 22.65/23.2 = 0.98b=var(x)cov(x,y)=22.65/23.2=0.98
a=yˉ−bxˉ=12.9−0.98∗11.2=1.92a = \bar y - b \bar x = 12.9-0.98*11.2=1.92a=yˉ−bxˉ=12.9−0.98∗11.2=1.92
模型为 y=1.92+0.98xy = 1.92+0.98xy=1.92+0.98x
2. 评价模型
R2=1−∑(yi−f(xi))2∑(yi−yˉ)2R^2 = 1-\frac{\sum(y_i-f(x_i))^2}{\sum(y_i-\bar y)^2}R2=1−∑(yi−yˉ)2∑(yi−f(xi))2
X_test = np.array([8,9,11,16,12]).reshape(-1,1)
y_test = [11,8.5,15,18,11]
r_squared = model.score(X_test, y_test)
r_squared # 0.6620052929422553