多元线性回归:y=w0 + w1x1 + w2x2 + … + wn*xn
逻辑回归(Logistic Regression)是做分类任务的
公式为:-(logq * p),例如实际为1,预测为0.8,则代入公式可得其损失函数(交叉熵)为 - [(log0.8) * 1]
答:交叉熵,做分类就用交叉熵,-y * logP,因为逻辑回归是二分类,所以损失函数loss func = (-y*logP + -(1-y)*log(1-P)),也就是说我们期望这个损失最小然后找到最优解,事实上,我们就可以利用前面学过的梯度下降法来求解最优解
答:逻辑回归做多分类,把多分类的问题,转化成多个二分类的问题,如果假如要分三个类别,就需要同时训练三个互相不影响的模型,比如我们n个维度,那么三分类,w参数的个数就会是 (n+1)*3个参数
import numpy as np
from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
import matplotlib.pyplot as plt
from time import timeiris = datasets.load_iris()
print(iris['feature_names'])#特征名X = iris['data'][:, 3:]#取出x矩阵
print(X)#petal width(cm)print(iris['target'])
y = iris['target']
# y = (iris['target'] == 2).astype(np.int)
print(y)#获取类别号# Utility function to report best scores
# def report(results, n_top=3):
# for i in range(1, n_top + 1):
# candidates = np.flatnonzero(results['rank_test_score'] == i)
# for candidate in candidates:
# print("Model with rank: {0}".format(i))
# print("Mean validation score: {0:.3f} (std: {1:.3f})".format(
# results['mean_test_score'][candidate],
# results['std_test_score'][candidate]))
# print("Parameters: {0}".format(results['params'][candidate]))
# print("")
# start = time()
# param_grid = {"tol": [1e-4, 1e-3, 1e-2],
# "C": [0.4, 0.6, 0.8]}
log_reg = LogisticRegression(multi_class='ovr', solver='sag')#多个二分类来解决多分类为ovr,若为multinomial则使用softmax求解多分类问题;梯度下降法sag;
# grid_search = GridSearchCV(log_reg, param_grid=param_grid, cv=3)
log_reg.fit(X, y)
# print("GridSearchCV took %.2f seconds for %d candidate parameter settings."
# % (time() - start, len(grid_search.cv_results_['params'])))
# report(grid_search.cv_results_)X_new = np.linspace(0, 3, 1000).reshape(-1, 1)#创建新的数据集,从0-3这个区间范围内,取1000个数值,linspace为平均分成1000个段,取出1000个点
print(X_new)y_proba = log_reg.predict_proba(X_new)#预测分类号具体分类成哪一个类别的概率值
y_hat = log_reg.predict(X_new)#预测分类号具体分类成哪一个类别,跟0.5去比较,从而划分为0或者1
print("w0",log_reg.intercept_)plt.plot(X_new, y_proba[:, 2], 'g-', label='Iris-Virginica')
plt.plot(X_new, y_proba[:, 1], 'r-', label='Iris-Versicolour')
plt.plot(X_new, y_proba[:, 0], 'b--', label='Iris-Setosa')
plt.show()print(log_reg.predict([[1.7], [1.5]]))
['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename', 'data_module']
