机器学习实战之logistic回归分类

利用logistic回归进行分类的主要思想:根据现有数据对分类边界建立回归公式,并以此进行分类。

 

logistic优缺点:

优点:计算代价不高,易于理解和实现。
缺点:容易欠拟合,分类精度可能不高。 .
适用数据类型:数值型和标称型数据。

 

sigmoid函数:

 

 

梯度上升法:

梯度:

该公式将一直被迭代执行,直至达到某个停止条件为止,比如迭代次数达到某个指定值或算
法达到某个可以允许的误差范围。

随机梯度上升法:

 梯度上升算法在每次更新回归系数时都需要遍历整个数据集, 该方法在处理100个左右的数
据集时尚可,但如果有数十亿样本和成千上万的特征,那么该方法的计算复杂度就太高了。一种
改进方法是一次仅用一个样本点来更新回归系数,该方法称为随机梯度上升算法。由于可以在新
样本到来时对分类器进行增量式更新,因而随机梯度上升算法是一个在线学习算法。与 “ 在线学
相对应,一次处理所有数据被称作是批处理” 。

梯度下降法:

你最经常听到的应该是梯度下降算法,它与这里的梯度上升算法是一样的,只是公式中的
加法需要变成减法。因此,对应的公式可以写成:

 

梯度上升算法用来求函数的最大值,而梯度下降算法用来求函数的最小值。

 

logistic预测疝气病预测病马的死亡率代码:

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import random# 加载数据集
def loadDataSet():dataMat = []labelMat = []fr = open('./testSet.txt')for line in fr.readlines():lineData = line.strip().split()dataMat.append([1.0, float(lineData[0]), float(lineData[1])])labelMat.append(int(lineData[2]))return dataMat, labelMat# sigmoid 函数
def sigmoid(inX):return 1.0 / (1 + np.exp(-inX))# 梯度上升
def gradAscent(dataMatIn, classLabels, maxCycles):dataMatrix = np.mat(dataMatIn)labelsMatrix = np.mat(classLabels).transpose() # 转置,将行向量转置为列向量m, n = np.shape(dataMatrix)alpha = 0.001W = np.ones((n, 1))for i in range(maxCycles):h = sigmoid(dataMatrix * W) # (100, 1)error = labelsMatrix - h # (100, 1)W = W + alpha * dataMatrix.transpose() * error # (3, 100) * (100, 1)return W #改进版随机梯度上升
def stocGradAscent1(dataMatrixIn, classLabels, numIter=150):dataMatrix = np.array(dataMatrixIn)m,n = np.shape(dataMatrix)weights = np.ones(n)   #initialize to all onesfor j in range(numIter):dataIndex = list(range(m))for i in range(m):alpha = 4.0/(1.0+j+i)+0.01    #apha decreases with iteration, does not randIndex = int(random.uniform(0,len(dataIndex)))#go to 0 because of the constanth = sigmoid(sum(dataMatrix[randIndex]*weights))error = classLabels[randIndex] - hweights = weights + alpha * error * dataMatrix[randIndex]del(dataIndex[randIndex])return np.mat(weights.reshape(n, 1))def plotBestFit(weights, dataMat, labelMat):dataArr = np.array(dataMat)n = np.shape(dataArr)[0]xcord1 = []; ycord1 = []xcord2 = []; ycord2 = []for i in range(n):if labelMat[i] == 1:xcord1.append(dataArr[i, 1]); ycord1.append(dataArr[i, 2])else:xcord2.append(dataArr[i, 1]); ycord2.append(dataArr[i, 2])fig = plt.figure()ax = fig.add_subplot(111)ax.scatter(xcord1, ycord1, s = 30, c = 'red', marker = 's')ax.scatter(xcord2, ycord2, s = 30, c = 'green')x = np.arange(-4.0, 4.0, 0.1)y = ((np.array((-weights[0] - weights[1] * x) / weights[2]))[0]).transpose()ax.plot(x, y)plt.xlabel('X1')plt.ylabel('X2')plt.show()# 预测
def classifyVector(inX, weights):prob = sigmoid(sum(inX * weights))if prob > 0.5:return 1.0else:return 0.0# 对训练集进行训练,并且对测试集进行测试
def colicTest():trainFile = open('horseColicTraining.txt')testFile = open('horseColicTest.txt')trainingSet = []; trainingLabels = []for line in trainFile.readlines():currLine = line.strip().split('\t')lineArr = []for i in range(21):lineArr.append(float(currLine[i]))trainingSet.append(lineArr)trainingLabels.append(float(currLine[21]))# 开始训练weights = stocGradAscent1(trainingSet, trainingLabels, 400)errorCount = 0.0numTestVec = 0.0for line in testFile.readlines():numTestVec += 1.0currLine = line.strip().split('\t')lineArr = []for i in range(21):lineArr.append(float(currLine[i]))if int(classifyVector(np.array(lineArr), weights)) != int(currLine[21]):errorCount += 1.0errorRate = errorCount / float(numTestVec)print("the error rate is:%f" % errorRate)return errorRate# 多次测试求平均值
def multiTest():testTimes = 10errorRateSum = 0.0for i in range(testTimes):errorRateSum += colicTest()print("the average error rate is:%f" % (errorRateSum / float(testTimes)))multiTest()

 

转载于:https://www.cnblogs.com/qiang-wei/p/10770285.html

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/279624.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

HDU 6343.Problem L. Graph Theory Homework-数学 (2018 Multi-University Training Contest 4 1012)

6343.Problem L. Graph Theory Homework 官方题解: 一篇写的很好的博客: HDU 6343 - Problem L. Graph Theory Homework - [(伪装成图论题的)简单数学题] 代码: 1 //1012-6343-数学2 #include<iostream>3 #include<cstdio>4 #include<cstring>5 #include<…

Android GridView LruCache

照片墙这种功能现在应该算是挺常见了&#xff0c;在很多应用中你都可以经常看到照片墙的身影。它的设计思路其实也非常简单&#xff0c;用一个GridView控件当作“墙”&#xff0c;然后随着GridView的滚动将一张张照片贴在“墙”上&#xff0c;这些照片可以是手机本地中存储的&a…

如何在Android TV上自定义推荐行

When you fire up Android TV, the first thing you see is a list of movies and shows the system thinks you’ll like. It’s often full of the latest flicks or hottest news, but sometimes it could just be things relevant to your interests and the apps you have…

steam串流到手机_如何从手机将Steam游戏下载到PC

steam串流到手机Steam allows you to remotely install games from your smartphone, just like you can with a PlayStation 4 or Xbox One. You can download games to your gaming PC from anywhere, ensuring those big downloads are complete and the game is ready to p…

禁用windows10更新_如何在Windows 10中禁用投影

禁用windows10更新The drop shadows on applications in the Windows 10 preview are really big and suspiciously similar to the ones in OS X, and if they aren’t your speed, you can easily remove them. We actually think they look good, but since somebody out th…

如何访问 Service?- 每天5分钟玩转 Docker 容器技术(99)

前面我们已经学习了如何部署 service&#xff0c;也验证了 swarm 的 failover 特性。不过截止到现在&#xff0c;有一个重要问题还没有涉及&#xff1a;如何访问 service&#xff1f;这就是本节要讨论的问题。 为了便于分析&#xff0c;我们重新部署 web_server。 ① docker se…

Linux配置手册(二)配置DHCP服务器

1.检查是否安装DHCP服务器软件 2.挂在RHEL5系统光盘 3.安装DHCP服务软件 4.将模板配置文件复制并覆盖现在的配置文件 5.配置修改dhcpd.conf文件 配置信息 默认租约时间 default-lease-time 最大租约时间 max-lease-time 局域网内所有主机的域名 option domain-name 客户机所使用…

什么是Google Play保护以及如何确保Android安全?

Android is open, flexible, and all about choice. Unfortunately, that flexibility comes more potential security issues. The good news is that Google has a system in place named Play Protect that helps keep Android secure. Android开放&#xff0c;灵活且具有多…

如何使计算机为您读取文档

Since the beginning of the computer age, people have always enjoyed making computers talk to them. These days, that functionality is built right into Windows and you can easily use it to have your PC read documents to you. 自计算机时代开始以来&#xff0c;人…

面试中常问的List去重问题,你都答对了吗?

2019独角兽企业重金招聘Python工程师标准>>> 面试中经常被问到的list如何去重&#xff0c;用来考察你对list数据结构&#xff0c;以及相关方法的掌握&#xff0c;体现你的java基础学的是否牢固。 我们大家都知道&#xff0c;set集合的特点就是没有重复的元素。如果集…

Coolite Toolkit学习笔记五:常用控件Menu和MenuPanel

Coolite Toolkit里的Menu控件和其他的.NET Web控件不一样&#xff0c;如果只是设计好了Menu或是通过程序初始化菜单项&#xff0c;菜单是不会呈现在界面上的&#xff0c;因为Coolite Toolkit规定Menu控件需要一个容器来做依托&#xff0c;而这个让Menu依托的控件就是MenuPanel&…

windows命令提示符_如何个性化Windows命令提示符

windows命令提示符Command line interfaces can be downright boring and always seem to miss out on the fresh coats of paint liberally applied to the rest of Windows. Here’s how to add a splash of color to Command Prompt and make it unique. 命令行界面可能非常…

android-api28转换到api19-不能编译

安装出现错误- rootponkan:/ # pm install /mnt/usb/sda1/app-debug.apkpkg: /mnt/usb/sda1/app-debug.apk Failure [INSTALL_FAILED_OLDER_SDK]查看系统和api版本 rootponkan:/ # getprop ro.build.version.release 5.1.1 rootponkan:/ # getprop ro.build.version.sdk 22将ap…

Java多线程编程 — 锁优化

2019独角兽企业重金招聘Python工程师标准>>> 阅读目录 一、尽量不要锁住方法 二、缩小同步代码块&#xff0c;只锁数据 三、锁中尽量不要再包含锁 四、将锁私有化&#xff0c;在内部管理锁 五、进行适当的锁分解 正文 并发环境下进行编程时&#xff0c;需要使用锁机…

Android Ap 开发 设计模式第六篇:原型模式

Prototype Pattern 名称由来 不是利用类来产生实例对象&#xff0c;而是从一个对象实例产生出另一个新的对象实例 &#xff0c;根据被视为原型的对象实例 &#xff0c;建立起的另一个新的对象实例就称为原型模式&#xff08;Ptototype Pattern&#xff09;。 需求场景 种类过多…

netty实现客户端服务端心跳重连

前言&#xff1a; 公司的加密机调度系统一直使用的是http请求调度的方式去调度&#xff0c;但是会出现网络故障导致某个客户端或者服务端断线的情况&#xff0c;导致很多请求信息以及回执信息丢失的情况&#xff0c;接着我们抛弃了http的方式&#xff0c;改为Tcp的方式去建立客…

为什么您仍然不应该购买《星球大战:前线II》

If you’ve been following video game news at all for the last couple of weeks, you’ve probably heard that EA’s Star Wars: Battlefront II is having some teething troubles. EA has backpedaled to avoid more controversy, but we’re here to say: don’t fall f…

quantum_如何从Firefox Quantum删除Pocket

quantumFirefox Quantum has deep integration with the Pocket read-it-later service, which is now owned by Mozilla. You’ll see a Pocket page action in the address bar, a “View Pocket List” feature in the Library, and recommended articles from Pocket on th…

Couchbase概述

Couchbase概述 Couchbase概述 Couchbase概述Couchbase最早叫Membase&#xff0c;是由Memcached项目组的一些头目另立的山头。2011年与CouchDB合并&#xff0c;正式命名为Couchbase。2013年&#xff0c;作为NoSQL技术初创企业&#xff0c;拿到了2500万美元的D轮投资。截稿时止&a…

Windows 2012 - Dynamic Access Control 浅析

Windows 2012相对于前几个版本而已&#xff0c;做出了大量的改进&#xff0c;尤其体现在安全性和虚拟化方面。Dynamic Access Control ( 动态访问控制&#xff09;是微软在文件服务器的访问控制上的新功能&#xff0c;极大的提高了安全性和灵活性。经过一天的研究学习&#xff…