机器学习之支持向量机

支持向量机:

超平面:比数据空间少一个维度,为了将数据进行切分,分为不同的类别,决策边界是超平面的一种
决策边界:就是再二分类问题中,找到一个超平面,将数据分为两类,最合适的超平面就叫做决策边界,当现有的数据难以二分类,需要对数据进行升维,将数据映射到高一维度,便于进行区分,比如说二维平面难以区分就升维三维进行区分
需要用到的api在sklearn中,需要用到的参数有linear,poly,sigmoid,rbf
导包:import sklearn.datasets as datasetsfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import f1_score#用来评价预测的结果准确率
需要对svc的参数kernel进行设置,kernel='rbf',kernel='poly',kernel='linear',kernel='sigmoid',代表四种分类模式

实现如下:

from sklearn.svm import SVC
import sklearn.datasets as datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
dt = datasets.load_breast_cancer()
# print(dt)
feature = dt['data']
target = dt['target']
x_train,x_test,y_train,y_test = train_test_split(feature,target,train_size=0.5,random_state=2023)
# 建立不同方式的svc
s1 = SVC(kernel='rbf').fit(x_train,y_train)
s2 = SVC(kernel='poly').fit(x_train,y_train)
s3 = SVC(kernel='linear').fit(x_train,y_train)
s4 = SVC(kernel='sigmoid').fit(x_train, y_train)
print('rbf的预测精度:',f1_score(y_test, s1.predict(x_test)))
print('poly的预测精度:',f1_score(y_test, s2.predict(x_test)))
print('linear的预测精度:',f1_score(y_test, s3.predict(x_test)))
print('sigmoid的预测精度:',f1_score(y_test, s4.predict(x_test)))

输出结果:

{'data': array([[1.799e+01, 1.038e+01, 1.228e+02, ..., 2.654e-01, 4.601e-01,1.189e-01],[2.057e+01, 1.777e+01, 1.329e+02, ..., 1.860e-01, 2.750e-01,8.902e-02],[1.969e+01, 2.125e+01, 1.300e+02, ..., 2.430e-01, 3.613e-01,8.758e-02],...,[1.660e+01, 2.808e+01, 1.083e+02, ..., 1.418e-01, 2.218e-01,7.820e-02],[2.060e+01, 2.933e+01, 1.401e+02, ..., 2.650e-01, 4.087e-01,1.240e-01],[7.760e+00, 2.454e+01, 4.792e+01, ..., 0.000e+00, 2.871e-01,7.039e-02]]), 'target': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0,1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0,1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1,1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0,0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1,1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1,1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0,0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0,1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1,1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1,1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1,1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0,0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0,0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0,1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1,1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0,1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1,1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0,1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1,1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1,1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1,1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1]), 'frame': None, 'target_names': array(['malignant', 'benign'], dtype='<U9'), 'DESCR': '.. _breast_cancer_dataset:\n\nBreast cancer wisconsin (diagnostic) dataset\n--------------------------------------------\n\n**Data Set Characteristics:**\n\n    :Number of Instances: 569\n\n    :Number of Attributes: 30 numeric, predictive attributes and the class\n\n    :Attribute Information:\n        - radius (mean of distances from center to points on the perimeter)\n        - texture (standard deviation of gray-scale values)\n        - perimeter\n        - area\n        - smoothness (local variation in radius lengths)\n        - compactness (perimeter^2 / area - 1.0)\n        - concavity (severity of concave portions of the contour)\n        - concave points (number of concave portions of the contour)\n        - symmetry\n        - fractal dimension ("coastline approximation" - 1)\n\n        The mean, standard error, and "worst" or largest (mean of the three\n        worst/largest values) of these features were computed for each image,\n        resulting in 30 features.  For instance, field 0 is Mean Radius, field\n        10 is Radius SE, field 20 is Worst Radius.\n\n        - class:\n                - WDBC-Malignant\n                - WDBC-Benign\n\n    :Summary Statistics:\n\n    ===================================== ====== ======\n                                           Min    Max\n    ===================================== ====== ======\n    radius (mean):                        6.981  28.11\n    texture (mean):                       9.71   39.28\n    perimeter (mean):                     43.79  188.5\n    area (mean):                          143.5  2501.0\n    smoothness (mean):                    0.053  0.163\n    compactness (mean):                   0.019  0.345\n    concavity (mean):                     0.0    0.427\n    concave points (mean):                0.0    0.201\n    symmetry (mean):                      0.106  0.304\n    fractal dimension (mean):             0.05   0.097\n    radius (standard error):              0.112  2.873\n    texture (standard error):             0.36   4.885\n    perimeter (standard error):           0.757  21.98\n    area (standard error):                6.802  542.2\n    smoothness (standard error):          0.002  0.031\n    compactness (standard error):         0.002  0.135\n    concavity (standard error):           0.0    0.396\n    concave points (standard error):      0.0    0.053\n    symmetry (standard error):            0.008  0.079\n    fractal dimension (standard error):   0.001  0.03\n    radius (worst):                       7.93   36.04\n    texture (worst):                      12.02  49.54\n    perimeter (worst):                    50.41  251.2\n    area (worst):                         185.2  4254.0\n    smoothness (worst):                   0.071  0.223\n    compactness (worst):                  0.027  1.058\n    concavity (worst):                    0.0    1.252\n    concave points (worst):               0.0    0.291\n    symmetry (worst):                     0.156  0.664\n    fractal dimension (worst):            0.055  0.208\n    ===================================== ====== ======\n\n    :Missing Attribute Values: None\n\n    :Class Distribution: 212 - Malignant, 357 - Benign\n\n    :Creator:  Dr. William H. Wolberg, W. Nick Street, Olvi L. Mangasarian\n\n    :Donor: Nick Street\n\n    :Date: November, 1995\n\nThis is a copy of UCI ML Breast Cancer Wisconsin (Diagnostic) datasets.\nhttps://goo.gl/U2Uwz2\n\nFeatures are computed from a digitized image of a fine needle\naspirate (FNA) of a breast mass.  They describe\ncharacteristics of the cell nuclei present in the image.\n\nSeparating plane described above was obtained using\nMultisurface Method-Tree (MSM-T) [K. P. Bennett, "Decision Tree\nConstruction Via Linear Programming." Proceedings of the 4th\nMidwest Artificial Intelligence and Cognitive Science Society,\npp. 97-101, 1992], a classification method which uses linear\nprogramming to construct a decision tree.  Relevant features\nwere selected using an exhaustive search in the space of 1-4\nfeatures and 1-3 separating planes.\n\nThe actual linear program used to obtain the separating plane\nin the 3-dimensional space is that described in:\n[K. P. Bennett and O. L. Mangasarian: "Robust Linear\nProgramming Discrimination of Two Linearly Inseparable Sets",\nOptimization Methods and Software 1, 1992, 23-34].\n\nThis database is also available through the UW CS ftp server:\n\nftp ftp.cs.wisc.edu\ncd math-prog/cpo-dataset/machine-learn/WDBC/\n\n.. topic:: References\n\n   - W.N. Street, W.H. Wolberg and O.L. Mangasarian. Nuclear feature extraction \n     for breast tumor diagnosis. IS&T/SPIE 1993 International Symposium on \n     Electronic Imaging: Science and Technology, volume 1905, pages 861-870,\n     San Jose, CA, 1993.\n   - O.L. Mangasarian, W.N. Street and W.H. Wolberg. Breast cancer diagnosis and \n     prognosis via linear programming. Operations Research, 43(4), pages 570-577, \n     July-August 1995.\n   - W.H. Wolberg, W.N. Street, and O.L. Mangasarian. Machine learning techniques\n     to diagnose breast cancer from fine-needle aspirates. Cancer Letters 77 (1994) \n     163-171.', 'feature_names': array(['mean radius', 'mean texture', 'mean perimeter', 'mean area','mean smoothness', 'mean compactness', 'mean concavity','mean concave points', 'mean symmetry', 'mean fractal dimension','radius error', 'texture error', 'perimeter error', 'area error','smoothness error', 'compactness error', 'concavity error','concave points error', 'symmetry error','fractal dimension error', 'worst radius', 'worst texture','worst perimeter', 'worst area', 'worst smoothness','worst compactness', 'worst concavity', 'worst concave points','worst symmetry', 'worst fractal dimension'], dtype='<U23'), 'filename': 'breast_cancer.csv', 'data_module': 'sklearn.datasets.data'}
rbf的预测精度: 0.9451697127937337
poly的预测精度: 0.9405684754521964
linear的预测精度: 0.9621621621621622
sigmoid的预测精度: 0.5898123324396783Process finished with exit code 0

得出sigmoid分类后,结果很差

对数据进行处理,归一化处理后,在进行分类:

from sklearn.svm import SVC
import sklearn.datasets as datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
from sklearn.preprocessing import MinMaxScaler
dt = datasets.load_breast_cancer()
# print(dt)
feature = dt['data']
target = dt['target']
# x_train,x_test,y_train,y_test = train_test_split(feature,target,train_size=0.5,random_state=2023)
# # 建立不同方式的svc
# s1 = SVC(kernel='rbf').fit(x_train,y_train)
# s2 = SVC(kernel='poly').fit(x_train,y_train)
# s3 = SVC(kernel='linear').fit(x_train,y_train)
# s4 = SVC(kernel='sigmoid').fit(x_train, y_train)
# print('rbf的预测精度:',f1_score(y_test, s1.predict(x_test)))
# print('poly的预测精度:',f1_score(y_test, s2.predict(x_test)))
# print('linear的预测精度:',f1_score(y_test, s3.predict(x_test)))
# print('sigmoid的预测精度:',f1_score(y_test, s4.predict(x_test)))
MM = MinMaxScaler()
n_feature = MM.fit_transform(feature)
x_train,x_test,y_train,y_test = train_test_split(n_feature,target,train_size=0.5,random_state=2023)
# 建立不同方式的svc
s1 = SVC(kernel='rbf').fit(x_train,y_train)
s2 = SVC(kernel='poly').fit(x_train,y_train)
s3 = SVC(kernel='linear').fit(x_train,y_train)
s4 = SVC(kernel='sigmoid').fit(x_train, y_train)
print('rbf的预测精度:',f1_score(y_test, s1.predict(x_test)))
print('poly的预测精度:',f1_score(y_test, s2.predict(x_test)))
print('linear的预测精度:',f1_score(y_test, s3.predict(x_test)))
print('sigmoid的预测精度:',f1_score(y_test, s4.predict(x_test)))

输出结果:

rbf的预测精度: 0.9837837837837838
poly的预测精度: 0.978494623655914
linear的预测精度: 0.9814323607427056
sigmoid的预测精度: 0.464864864864864

我们会发现除了sigmoid都提升了准确性rbf不擅长处理数据分布不均匀的情况。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/134868.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Vue中nextTick的使用及原理

在Vue.js中&#xff0c;nextTick方法可以让我们在DOM更新后执行一些操作。通常情况下&#xff0c;在数据发生变化后&#xff0c;Vue.js会异步地更新DOM&#xff0c;这样可以减少不必要的DOM操作&#xff0c;提高性能。但是&#xff0c;有时候我们需要在DOM更新后对页面进行一些…

VSCode代码调试

1. C Linux上建议用cmake编译&#xff0c;然后用vscode调试&#xff0c;所以只需修改launch.json这个文件&#xff0c;然后点击Run->Start Debugging进行调试 1.1. launch.json文件 打开方式&#xff1a;Run->Open Configurations {"version": "0.2…

安全测试,接口返回内容遍历~

最近公司被人大量爬取数据&#xff0c;查了一下发现&#xff0c;用户主页接口&#xff0c;没有加用户登录校验&#xff0c;返回了用户的敏感信息有手机号和邮箱&#xff0c;其实这个接口是用不到这些信息的。再加上用户id是自增长的&#xff0c;所以很容易被别人爬取。 既然这…

解决kubernetes集群证书过期的问题

现象&#xff1a; 解决办法&#xff1a; 1.在master节点运行&#xff1a; kubeadm alpha certs renew all 2.在master节点运行&#xff1a; rm -f /etc/kubernetes/kubelet.conf && cp /etc/kubernetes/admin.conf /etc/kubernetes/bootstrap-kubelet.conf 3.在maste…

华为fusionInsigtht集群es连接工具

华为fusionInsight为用户提供海量数据的管理及分析功能&#xff0c;快速从结构化和非结构化的海量数据中挖掘您所需要的价值数据。开源组件结构复杂&#xff0c;安装、配置、管理过程费时费力&#xff0c;使用华为FusionInsight Manager将为您提供企业级的集群的统一管理平台,在…

app全屏广告变现,有哪些利弊?如何发挥全屏广告的变现潜力?

全屏广告是APP变现过程中一种广泛应用的广告形式&#xff0c;全屏广告有哪些优势呢&#xff1f;开发者如何发挥全屏广告的变现潜力&#xff0c;最大化变现收益&#xff1f; https://www.shenshiads.com 01、全屏广告的优势 作为一种占据整个屏幕的广告形式&#xff0c;全屏广…

大语言模型(LLM)综述(六):大型语言模型的基准和评估

A Survey of Large Language Models 前言7 CAPACITY AND EVALUATION7.1 基本能力7.1.1 语言生成7.1.2 知识利用7.1.3 复杂推理 7.2 高级能力7.2.1 人类对齐7.2.2 与外部环境的交互7.2.3 工具操作 7.3 基准和评估方法7.3.1 综合评价基准7.3.2 评估方法 7.4 实证评估7.4.1 实验设…

LeetCode 面试题 16.15. 珠玑妙算

文章目录 一、题目二、C# 题解 一、题目 珠玑妙算游戏&#xff08;the game of master mind&#xff09;的玩法如下。 计算机有4个槽&#xff0c;每个槽放一个球&#xff0c;颜色可能是红色&#xff08;R&#xff09;、黄色&#xff08;Y&#xff09;、绿色&#xff08;G&#…

AC修炼计划(AtCoder Regular Contest 163)

传送门&#xff1a;AtCoder Regular Contest 163 - AtCoder 第一题我们只需要将字符串分成两段&#xff0c;如果存在前面一段比后面一段大就成立。 #include<bits/stdc.h> #define int long long using namespace std; typedef long long ll; typedef pair<int,int&g…

安防监控系统EasyCVR平台设备通道绑定AI算法的功能设计与开发实现

安防视频监控/视频集中存储/云存储/磁盘阵列EasyCVR平台可拓展性强、视频能力灵活、部署轻快&#xff0c;可支持的主流标准协议有国标GB28181、RTSP/Onvif、RTMP等&#xff0c;以及支持厂家私有协议与SDK接入&#xff0c;包括海康Ehome、海大宇等设备的SDK等。平台可拓展性强、…

Leetcode-面试题 02.02 返回倒数第 k 个节点

快慢指针&#xff1a;让快指针先移动n个节点&#xff0c;之后快慢指针一起依次向后移动一个结点&#xff0c;等到快指针移动到链表尾时&#xff0c;慢指针则移动到倒数第n个结点位置。 /*** Definition for singly-linked list.* public class ListNode {* int val;* …

你使用过哪些版本控制工具?

Git&#xff1a;Git 是目前最受欢迎和广泛使用的分布式版本控制系统。它提供了强大的分支管理、合并和版本控制功能&#xff0c;并具有高效的性能和灵活性。Subversion&#xff08;SVN&#xff09;&#xff1a;Subversion 是一个集中式版本控制系统&#xff0c;被广泛用于许多软…

Win10系统下torch.cuda.is_available()返回为False的问题解决

Q: Win10系统下torch.cuda.is_available()返回为False (l2) D:\opt\l2>pythonPython 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:34:57) [MSC v.1936 64 bit (AMD64)] on win32Type "help", "copyright", "credits" or &q…

[ACTF2020 新生赛]BackupFile 1

题目环境&#xff1a; 好好好&#xff0c;让找源文件是吧&#xff1f;咱们二话不说直接扫它后台 使用dirsearch工具扫描网站后台&#xff08;博主有这个工具的压缩包&#xff0c;可以私聊我领取&#xff09;python dirsearch.py -u http://0d418151-ebaf-4f26-86b2-5363ed16530…

「Verilog学习笔记」求两个数的差值

专栏前言 本专栏的内容主要是记录本人学习Verilog过程中的一些知识点&#xff0c;刷题网站用的是牛客网 timescale 1ns/1ns module data_minus(input clk,input rst_n,input [7:0]a,input [7:0]b,output reg [8:0]c );always (posedge clk or negedge rst_n) begin if (~rst_…

华为防火墙基本原理工作方法总结

防火墙只会对tcp首包syn建立会话表&#xff0c;其它丢掉&#xff0c;如synack&#xff0c;ack udp直接建立会话表 icmp只对首包请求包建立会话表&#xff0c;其它包&#xff0c;如应答的不会建立直接丢掉 防火墙状态查看&#xff1a; rule name trust_untrust source-zone tru…

Spring RabbitMQ那些事(1-交换机配置消息发送订阅实操)

这里写目录标题 一、序言二、配置文件application.yml三、RabbitMQ交换机和队列配置1、定义4个队列2、定义Fanout交换机和队列绑定关系2、定义Direct交换机和队列绑定关系3、定义Topic交换机和队列绑定关系4、定义Header交换机和队列绑定关系 四、RabbitMQ消费者配置五、Rabbit…

【GEE】8、Google 地球引擎中的时间序列分析【时间序列】

1简介 在本模块中&#xff0c;我们将讨论以下概念&#xff1a; 处理海洋的遥感图像。 从图像时间序列创建视频。 GEE 中的时间序列分析。 向图形用户界面添加基本元素。 2背景 深水地平线漏油事件被认为是有史以来最大的海上意外漏油事件。该井释放了超过 490 万桶石油&am…

Sysmon 日志监控

系统监视器 &#xff08;Sysmon&#xff09; 是一个 Windows 日志记录加载项&#xff0c;它提供精细的日志记录功能并捕获默认情况下通常不记录的安全事件。它提供有关进程创建、网络连接、文件系统更改等的信息。分析 Sysmon 日志对于发现恶意活动和安全威胁至关重要。 在不断…