机器学习之支持向量机

支持向量机：

超平面：比数据空间少一个维度，为了将数据进行切分，分为不同的类别，决策边界是超平面的一种
决策边界：就是再二分类问题中，找到一个超平面，将数据分为两类，最合适的超平面就叫做决策边界，当现有的数据难以二分类，需要对数据进行升维，将数据映射到高一维度，便于进行区分，比如说二维平面难以区分就升维三维进行区分
需要用到的api在sklearn中，需要用到的参数有linear，poly，sigmoid，rbf
导包：import sklearn.datasets as datasetsfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import f1_score#用来评价预测的结果准确率
需要对svc的参数kernel进行设置，kernel='rbf'，kernel='poly'，kernel='linear'，kernel='sigmoid'，代表四种分类模式

实现如下：

from sklearn.svm import SVC
import sklearn.datasets as datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
dt = datasets.load_breast_cancer()
# print(dt)
feature = dt['data']
target = dt['target']
x_train,x_test,y_train,y_test = train_test_split(feature,target,train_size=0.5,random_state=2023)
# 建立不同方式的svc
s1 = SVC(kernel='rbf').fit(x_train,y_train)
s2 = SVC(kernel='poly').fit(x_train,y_train)
s3 = SVC(kernel='linear').fit(x_train,y_train)
s4 = SVC(kernel='sigmoid').fit(x_train, y_train)
print('rbf的预测精度：',f1_score(y_test, s1.predict(x_test)))
print('poly的预测精度：',f1_score(y_test, s2.predict(x_test)))
print('linear的预测精度：',f1_score(y_test, s3.predict(x_test)))
print('sigmoid的预测精度：',f1_score(y_test, s4.predict(x_test)))

输出结果：

{'data': array([[1.799e+01, 1.038e+01, 1.228e+02, ..., 2.654e-01, 4.601e-01,1.189e-01],[2.057e+01, 1.777e+01, 1.329e+02, ..., 1.860e-01, 2.750e-01,8.902e-02],[1.969e+01, 2.125e+01, 1.300e+02, ..., 2.430e-01, 3.613e-01,8.758e-02],...,[1.660e+01, 2.808e+01, 1.083e+02, ..., 1.418e-01, 2.218e-01,7.820e-02],[2.060e+01, 2.933e+01, 1.401e+02, ..., 2.650e-01, 4.087e-01,1.240e-01],[7.760e+00, 2.454e+01, 4.792e+01, ..., 0.000e+00, 2.871e-01,7.039e-02]]), 'target': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0,1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0,1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1,1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0,0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1,1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1,1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0,0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0,1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1,1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1,1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1,1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0,0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0,0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0,1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1,1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0,1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1,1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0,1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1,1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1,1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1,1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1]), 'frame': None, 'target_names': array(['malignant', 'benign'], dtype='<U9'), 'DESCR': '.. _breast_cancer_dataset:\n\nBreast cancer wisconsin (diagnostic) dataset\n--------------------------------------------\n\n**Data Set Characteristics:**\n\n    :Number of Instances: 569\n\n    :Number of Attributes: 30 numeric, predictive attributes and the class\n\n    :Attribute Information:\n        - radius (mean of distances from center to points on the perimeter)\n        - texture (standard deviation of gray-scale values)\n        - perimeter\n        - area\n        - smoothness (local variation in radius lengths)\n        - compactness (perimeter^2 / area - 1.0)\n        - concavity (severity of concave portions of the contour)\n        - concave points (number of concave portions of the contour)\n        - symmetry\n        - fractal dimension ("coastline approximation" - 1)\n\n        The mean, standard error, and "worst" or largest (mean of the three\n        worst/largest values) of these features were computed for each image,\n        resulting in 30 features.  For instance, field 0 is Mean Radius, field\n        10 is Radius SE, field 20 is Worst Radius.\n\n        - class:\n                - WDBC-Malignant\n                - WDBC-Benign\n\n    :Summary Statistics:\n\n    ===================================== ====== ======\n                                           Min    Max\n    ===================================== ====== ======\n    radius (mean):                        6.981  28.11\n    texture (mean):                       9.71   39.28\n    perimeter (mean):                     43.79  188.5\n    area (mean):                          143.5  2501.0\n    smoothness (mean):                    0.053  0.163\n    compactness (mean):                   0.019  0.345\n    concavity (mean):                     0.0    0.427\n    concave points (mean):                0.0    0.201\n    symmetry (mean):                      0.106  0.304\n    fractal dimension (mean):             0.05   0.097\n    radius (standard error):              0.112  2.873\n    texture (standard error):             0.36   4.885\n    perimeter (standard error):           0.757  21.98\n    area (standard error):                6.802  542.2\n    smoothness (standard error):          0.002  0.031\n    compactness (standard error):         0.002  0.135\n    concavity (standard error):           0.0    0.396\n    concave points (standard error):      0.0    0.053\n    symmetry (standard error):            0.008  0.079\n    fractal dimension (standard error):   0.001  0.03\n    radius (worst):                       7.93   36.04\n    texture (worst):                      12.02  49.54\n    perimeter (worst):                    50.41  251.2\n    area (worst):                         185.2  4254.0\n    smoothness (worst):                   0.071  0.223\n    compactness (worst):                  0.027  1.058\n    concavity (worst):                    0.0    1.252\n    concave points (worst):               0.0    0.291\n    symmetry (worst):                     0.156  0.664\n    fractal dimension (worst):            0.055  0.208\n    ===================================== ====== ======\n\n    :Missing Attribute Values: None\n\n    :Class Distribution: 212 - Malignant, 357 - Benign\n\n    :Creator:  Dr. William H. Wolberg, W. Nick Street, Olvi L. Mangasarian\n\n    :Donor: Nick Street\n\n    :Date: November, 1995\n\nThis is a copy of UCI ML Breast Cancer Wisconsin (Diagnostic) datasets.\nhttps://goo.gl/U2Uwz2\n\nFeatures are computed from a digitized image of a fine needle\naspirate (FNA) of a breast mass.  They describe\ncharacteristics of the cell nuclei present in the image.\n\nSeparating plane described above was obtained using\nMultisurface Method-Tree (MSM-T) [K. P. Bennett, "Decision Tree\nConstruction Via Linear Programming." Proceedings of the 4th\nMidwest Artificial Intelligence and Cognitive Science Society,\npp. 97-101, 1992], a classification method which uses linear\nprogramming to construct a decision tree.  Relevant features\nwere selected using an exhaustive search in the space of 1-4\nfeatures and 1-3 separating planes.\n\nThe actual linear program used to obtain the separating plane\nin the 3-dimensional space is that described in:\n[K. P. Bennett and O. L. Mangasarian: "Robust Linear\nProgramming Discrimination of Two Linearly Inseparable Sets",\nOptimization Methods and Software 1, 1992, 23-34].\n\nThis database is also available through the UW CS ftp server:\n\nftp ftp.cs.wisc.edu\ncd math-prog/cpo-dataset/machine-learn/WDBC/\n\n.. topic:: References\n\n   - W.N. Street, W.H. Wolberg and O.L. Mangasarian. Nuclear feature extraction \n     for breast tumor diagnosis. IS&T/SPIE 1993 International Symposium on \n     Electronic Imaging: Science and Technology, volume 1905, pages 861-870,\n     San Jose, CA, 1993.\n   - O.L. Mangasarian, W.N. Street and W.H. Wolberg. Breast cancer diagnosis and \n     prognosis via linear programming. Operations Research, 43(4), pages 570-577, \n     July-August 1995.\n   - W.H. Wolberg, W.N. Street, and O.L. Mangasarian. Machine learning techniques\n     to diagnose breast cancer from fine-needle aspirates. Cancer Letters 77 (1994) \n     163-171.', 'feature_names': array(['mean radius', 'mean texture', 'mean perimeter', 'mean area','mean smoothness', 'mean compactness', 'mean concavity','mean concave points', 'mean symmetry', 'mean fractal dimension','radius error', 'texture error', 'perimeter error', 'area error','smoothness error', 'compactness error', 'concavity error','concave points error', 'symmetry error','fractal dimension error', 'worst radius', 'worst texture','worst perimeter', 'worst area', 'worst smoothness','worst compactness', 'worst concavity', 'worst concave points','worst symmetry', 'worst fractal dimension'], dtype='<U23'), 'filename': 'breast_cancer.csv', 'data_module': 'sklearn.datasets.data'}
rbf的预测精度： 0.9451697127937337
poly的预测精度： 0.9405684754521964
linear的预测精度： 0.9621621621621622
sigmoid的预测精度： 0.5898123324396783Process finished with exit code 0

得出sigmoid分类后，结果很差

对数据进行处理，归一化处理后，在进行分类：

from sklearn.svm import SVC
import sklearn.datasets as datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
from sklearn.preprocessing import MinMaxScaler
dt = datasets.load_breast_cancer()
# print(dt)
feature = dt['data']
target = dt['target']
# x_train,x_test,y_train,y_test = train_test_split(feature,target,train_size=0.5,random_state=2023)
# # 建立不同方式的svc
# s1 = SVC(kernel='rbf').fit(x_train,y_train)
# s2 = SVC(kernel='poly').fit(x_train,y_train)
# s3 = SVC(kernel='linear').fit(x_train,y_train)
# s4 = SVC(kernel='sigmoid').fit(x_train, y_train)
# print('rbf的预测精度：',f1_score(y_test, s1.predict(x_test)))
# print('poly的预测精度：',f1_score(y_test, s2.predict(x_test)))
# print('linear的预测精度：',f1_score(y_test, s3.predict(x_test)))
# print('sigmoid的预测精度：',f1_score(y_test, s4.predict(x_test)))
MM = MinMaxScaler()
n_feature = MM.fit_transform(feature)
x_train,x_test,y_train,y_test = train_test_split(n_feature,target,train_size=0.5,random_state=2023)
# 建立不同方式的svc
s1 = SVC(kernel='rbf').fit(x_train,y_train)
s2 = SVC(kernel='poly').fit(x_train,y_train)
s3 = SVC(kernel='linear').fit(x_train,y_train)
s4 = SVC(kernel='sigmoid').fit(x_train, y_train)
print('rbf的预测精度：',f1_score(y_test, s1.predict(x_test)))
print('poly的预测精度：',f1_score(y_test, s2.predict(x_test)))
print('linear的预测精度：',f1_score(y_test, s3.predict(x_test)))
print('sigmoid的预测精度：',f1_score(y_test, s4.predict(x_test)))

输出结果：

rbf的预测精度： 0.9837837837837838
poly的预测精度： 0.978494623655914
linear的预测精度： 0.9814323607427056
sigmoid的预测精度： 0.464864864864864

我们会发现除了sigmoid都提升了准确性rbf不擅长处理数据分布不均匀的情况。

机器学习之支持向量机

支持向量机：

相关文章

Vue中nextTick的使用及原理

VSCode代码调试

安全测试，接口返回内容遍历~

解决kubernetes集群证书过期的问题

华为fusionInsigtht集群es连接工具

app全屏广告变现，有哪些利弊？如何发挥全屏广告的变现潜力？

大语言模型(LLM)综述(六)：大型语言模型的基准和评估

LeetCode 面试题 16.15. 珠玑妙算

AC修炼计划（AtCoder Regular Contest 163）

安防监控系统EasyCVR平台设备通道绑定AI算法的功能设计与开发实现

Leetcode-面试题 02.02 返回倒数第 k 个节点

你使用过哪些版本控制工具？

Linux学习之MySQL常见面试题目

Win10系统下torch.cuda.is_available()返回为False的问题解决

[ACTF2020 新生赛]BackupFile 1

「Verilog学习笔记」求两个数的差值

华为防火墙基本原理工作方法总结

Spring RabbitMQ那些事（1-交换机配置消息发送订阅实操）

【GEE】8、Google 地球引擎中的时间序列分析【时间序列】

Sysmon 日志监控