为什么选择做班级管理系统_为什么即使在平衡的班级下准确性也很麻烦

为什么选择做班级管理系统

Accuracy is a go-to metric because it’s highly interpretable and low-cost to evaluate. For this reason, accuracy — perhaps the most simple of machine learning metrics — is (rightfully) commonplace. However, it’s also true that many people are too comfortable with accuracy.

准确性是首选指标,因为它具有很高的解释性和低成本。 因此,准确性(也许是机器学习指标中最简单的一种)(理所应当)是司空见惯的。 然而,这也是事实,很多人都舒服的准确性。

Being aware of the limitations of accuracy is essential.

意识到准确性的局限性是至关重要的。

Everyone knows that accuracy is misused on unbalanced datasets: for instance, in a medical condition dataset, where the majority of people do not have condition x (let’s say 95%) and the remainder do have condition x.

每个人都知道在不平衡的数据集上滥用了准确性:例如,在医疗状况数据集中,大多数人没有状况x (比如说95%),其余人确实有状况x

Since machine learning models are always looking for the easy way out, and especially if an L2 penalization is used (a proportionately less penalty on lower errors), the model can comfortably get away with 95% accuracy only guessing all inputs do not have condition x.

由于机器学习模型一直在寻找简便的方法,特别是如果使用L2惩罚(对较低的错误按比例减少的惩罚),则仅凭猜测所有输入都没有条件x即可轻松获得95%的精度的模型。

The reply to this common issue is to use some sort of metric that takes into account the unbalanced classes and somehow compensates lack of quantity with a boost of importance, like an F1 score or balanced accuracy.

解决此常见问题的方法是使用某种度量标准,该度量标准考虑了不平衡的类别,并以某种重要的方式补偿了数量不足的情况,例如F1得分或平衡准确性。

Beyond this common critique, however — which doesn’t address other limitations of accuracy — there are some other problems with using accuracy that go beyond just dealing with balanced classes.

但是,除了这种常见的批评之外(没有解决准确性的其他限制),使用准确性还有其他一些问题,这些问题不仅仅涉及平衡类。

Everyone agrees that training/testing and deployment of a model should be kept separate. More specifically, the former should be statistical, and the latter should be decision-based. However, there is nothing statistical about turning the outputs of machine learning models — which are (almost) always probabilistic — into decisions, and evaluating its statistical goodness based on this converted output.

每个人都同意,应分开进行模型的培训/测试和部署。 更具体地说,前者应该是统计的,而后者应该是基于决策的。 但是,没有关于将机器学习模型的输出(几乎总是概率)转化为决策,并基于转换后的输出评估其统计优势的统计信息。

Take a look at the outputs for two machine learning models: should they really be getting the same results? Moreover, even if one tries to remedy accuracy with other decision-based metrics like the commonly prescribed specificity/sensitivity or F1 score, the same problem exists.

看一下两个机器学习模型的输出:它们是否真的会得到相同的结果? 此外,即使人们试图用其他基于决策的指标(如通常规定的特异性/敏感性或F1分数)来纠正准确性,也存在相同的问题。

Image for post
Image created by author
图片由作者创建

Model 2 is far less confident in its results than Model 1 is, but both receive the same accuracy. Accuracy is not a legitimate scoring rule, and hence it is deceiving in an inherently probabilistic environment.

模型2对结果的信心远不如模型1可靠,但两者的准确性相同。 准确性不是一个合理的评分规则,因此它在固有的概率环境中具有欺骗性。

While it can be used in the final presenting of a model, it leaves an empty void of information pertaining to the confidence of the model; whether it actually knew the class for most of the training samples or if it was only lucky in crossing on the right side of the 0.5 threshold.

虽然可以在模型的最终展示中使用它,但它留下了与模型的置信度有关的信息的空白; 无论是实际上对大多数训练样本都知道这门课,还是只是幸运地越过了0.5个阈值的右侧。

This is also problematic. How can a reliable loss function — the guiding light that shows the model what is right and what is wrong — completely tilt its decision 180 degrees if the output probability shifts 0.01%? If a training sample with label ‘1’ received predictions 0.51 and 0.49 from model 1 and model 2, respectively, is it fair that model 2 is penalized at the full possible value? Thresholds, while necessary for decision-making in a physically deterministic world, are too sensitive and hence inappropriate for training and testing.

这也是有问题的。 如果输出概率偏移0.01%,可靠的损失函数(显示模型正确与错误的指示灯)如何将其决策完全倾斜180度? 如果带有标签“ 1”的训练样本分别从模型1和模型2接收到预测0.51和0.49,那么将模型2惩罚为可能的全部值是否公平? 阈值虽然在物理确定性世界中进行决策是必需的,但过于敏感,因此不适用于培训和测试。

Speaking of thresholds — consider this. You are creating a machine learning model to decide if a patient should receive a very invasive and painful surgery treatment. Where do you decide the threshold to give the recommendation? Instinctively, most likely not at a default 0.5, but at some higher probability: the patient is subjected to this treatment if, and only if, the model is absolutely sure. On the other hand, if the treatment is something less serious like an aspirin, it is less so.

说到阈值,请考虑一下。 您正在创建一个机器学习模型,以决定患者是否应该接受侵入性和痛苦性极高的手术治疗。 您在哪里确定提出建议的门槛? 本能地,最有可能不是默认值0.5,而是更高的概率:当且仅当模型是绝对确定的,患者才接受这种治疗。 另一方面,如果像阿斯匹林这样的不那么严重的治疗方法,那么情况就不那么严重了。

The results of the decision dictate the thresholds for forming it. This idea, hard-coding morality and human feeling into a machine learning model, is difficult to think about. One may be inclined to argue that over time and under the right balanced circumstances, the model will automatically shift its output probability distributions to a 0.5 threshold and manually adding a threshold is tampering with the model’s learning.

决策的结果决定了形成决策的阈值。 将道德和人类感觉硬编码到机器学习模型中的想法很难考虑。 有人可能会争辩说,随着时间的流逝,在正确的平衡情况下,该模型将自动将其输出概率分布更改为0.5阈值,而手动添加阈值会篡改该模型的学习。

The rebuttal would be to not use decision-based scoring functions in the first place, not hard-coding any number, including a 0.5 threshold, at all. This way, the model learns not to cheat and take the easy way out through artificially constructed continuous-to-discrete conversions but to maximize its probability of correct answers.

反对是首先不使用基于决策的评分功能,根本不硬编码任何数字,包括0.5阈值。 这样,该模型就不会通过人工构造的连续到离散转换来欺骗并采取简单的方法,而是最大限度地提高其正确答案的可能性。

Whenever a threshold is introduced in the naturally probabilistic and fluid nature of machine learning algorithms, it causes more problems than it fixes.

每当在机器学习算法的自然概率和流动性中引入阈值时,它都会引起更多的问题,而不是要解决的问题。

Loss functions that treat probability on the continuous scale it is instead of as discrete buckets are the way to go.

损失函数在连续尺度上处理概率,而不是像离散的桶那样走。

What are some better, probability-based, and more informative metrics to use for honestly evaluating a model’s performance?

有什么更好的,基于概率的,更多信息的指标可用于诚实地评估模型的性能?

  • Brier score

    刺分数
  • Log score

    日志分数
  • Cross-entropy

    交叉熵

In the end, accuracy is an important and permanent part of the metrics family. But for those who decide to use it: understand that accuracy’s interpretability and simplicity comes at a heavy cost.

最后,准确性是指标系列的重要且永久的组成部分。 但是对于那些决定使用它的人:请理解准确性的可解释性和简单性要付出沉重的代价。

翻译自: https://medium.com/analytics-vidhya/why-accuracy-is-troublesome-even-with-balanced-classes-590b405f5a06

为什么选择做班级管理系统

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/390028.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

使用Chrome开发者工具调试Android端内网页(微信,QQ,UC,App内嵌页等)

使用Chrome开发者工具调试Android端内网页(微信,QQ,UC,App内嵌页等) 传送门转载于:https://www.cnblogs.com/momozjm/p/9389912.html

517. 超级洗衣机

517. 超级洗衣机 假设有 n 台超级洗衣机放在同一排上。开始的时候&#xff0c;每台洗衣机内可能有一定量的衣服&#xff0c;也可能是空的。 在每一步操作中&#xff0c;你可以选择任意 m (1 < m < n) 台洗衣机&#xff0c;与此同时将每台洗衣机的一件衣服送到相邻的一台…

netflix的准实验面临的主要挑战

重点 (Top highlight)Kamer Toker-Yildiz, Colin McFarland, Julia GlickKAMER Toker-耶尔德兹 &#xff0c; 科林麦克法兰 &#xff0c; Julia格里克 At Netflix, when we can’t run A/B experiments we run quasi experiments! We run quasi experiments with various obje…

网站漏洞检测针对区块链网站安全分析

2019独角兽企业重金招聘Python工程师标准>>> 目前移动互联网中&#xff0c;区块链的网站越来越多&#xff0c;在区块链安全上&#xff0c;很多都存在着网站漏洞&#xff0c;区块链的充值&#xff0c;会员账号的存储性XSS窃取漏洞&#xff0c;账号安全&#xff0c;等…

223. 矩形面积

223. 矩形面积 给你 二维 平面上两个 由直线构成的 矩形&#xff0c;请你计算并返回两个矩形覆盖的总面积。 每个矩形由其 左下 顶点和 右上 顶点坐标表示&#xff1a; 第一个矩形由其左下顶点 (ax1, ay1) 和右上顶点 (ax2, ay2) 定义。 第二个矩形由其左下顶点 (bx1, by1) …

微观计量经济学_微观经济学与数据科学

微观计量经济学什么是经济学和微观经济学&#xff1f; (What are Economics and Microeconomics?) Economics is a social science concerned with the production, distribution, and consumption of goods and services. It studies how individuals, businesses, governmen…

NPM 重新回炉

官方教程传送门( 英文 ) 本文主要是官方文章的精炼,适合想了解一些常用操作的同学们 NPM 是 基于node的一个包管理工具 , 安装node环境时会自带安装NPM. NPM版本管理 查看现有版本 npm -v 安装最新的稳定版本 npm install npmlatest -g 安装最新的测试版本 npm install npmn…

1436. 旅行终点站

1436. 旅行终点站 给你一份旅游线路图&#xff0c;该线路图中的旅行线路用数组 paths 表示&#xff0c;其中 paths[i] [cityAi, cityBi] 表示该线路将会从 cityAi 直接前往 cityBi 。请你找出这次旅行的终点站&#xff0c;即没有任何可以通往其他城市的线路的城市。 题目数据…

如何使用fio模拟线上环境

线上表现 这里我想通过fio来模拟线上的IO场景&#xff0c;那么如何模拟呢&#xff1f; 首先使用iostat看线上某个盘的 使用情况&#xff0c;这里我们需要关注的是 avgrq-sz, avgrq-qz. #iostat -dx 1 1000 /dev/sdk Device: rrqm/s wrqm/s r/s w/s rkB/s …

熊猫数据集_熊猫迈向数据科学的第二部分

熊猫数据集If you haven’t read the first article then it is advised that you go through that before continuing with this article. You can find that article here. So far we have learned how to access data in different ways. Now we will learn how to analyze …

Python基础综合练习

Pycharm开发环境设置与熟悉。 练习基本输入输出&#xff1a; print(你好,{}..format(name)) print(sys.argv) 库的使用方法&#xff1a; import ... from ... import ... 条件语句&#xff1a; if (abs(pos()))<1: break 循环语句&#xff1a; for i in range(5): while Tru…

POJ 3608 旋转卡壳

思路&#xff1a; 旋转卡壳应用 注意点&边 边&边 点&点 三种情况 //By SiriusRen #include <cmath> #include <cstdio> #include <algorithm> using namespace std; const double eps1e-5; const int N10050; typedef double db; int n,m; str…

405. 数字转换为十六进制数

405. 数字转换为十六进制数 给定一个整数&#xff0c;编写一个算法将这个数转换为十六进制数。对于负整数&#xff0c;我们通常使用 补码运算 方法。 注意: 十六进制中所有字母(a-f)都必须是小写。 十六进制字符串中不能包含多余的前导零。如果要转化的数为0&#xff0c;那么…

为什么我要重新开始数据科学

I’m feeling stuck.我感觉卡住了。 In my current work and in the content I create (videos and blog posts), I feel like I’ve begun to stall out. Most of the consumers of my content are at the start of their data science journey. The longer I’m in the fiel…

蓝牙协议 HFP,HSP,A2DP,A2DP_CT,A2DP_TG,AVRCP,OPP,PBAP,SPP,FTP,TP,DTMF,DUN,SDP

简介&#xff1a; HSP&#xff08;手机规格&#xff09;– 提供手机&#xff08;移动电话&#xff09;与耳机之间通信所需的基本功能。 HFP&#xff08;免提规格&#xff09;– 在 HSP 的基础上增加了某些扩展功能&#xff0c;原来只用于从固定车载免提装置来控制移动电话。 A2…

482. 密钥格式化

482. 密钥格式化 有一个密钥字符串 S &#xff0c;只包含字母&#xff0c;数字以及 ‘-’&#xff08;破折号&#xff09;。其中&#xff0c; N 个 ‘-’ 将字符串分成了 N1 组。 给你一个数字 K&#xff0c;请你重新格式化字符串&#xff0c;使每个分组恰好包含 K 个字符。特…

安装mariadb、安装Apache

2019独角兽企业重金招聘Python工程师标准>>> 安装mariadb 安装mariadb的步骤与安装mysql的一样 下载二进制源码包 再用tar 解压&#xff0c;创建/data/mariadb目录和用户 初始化 编译启动脚本 启动 安装Apache Apache是软件基金会的名字&#xff0c;软件的名字叫htt…

数据科学的发展_数据科学的发展与发展

数据科学的发展There’s perhaps nothing that sets the 21st century apart from others more than the concept of data. Every interaction we have with a connected device creates a data record, and beams it back to some data store for tracking and analysis. Inte…

Polling 、Long Polling 和 WebSocket

最近在学习研究WebSocket,了解到Polling 和Long Polling,翻阅了一些博文&#xff0c;根据自己的理解&#xff0c;做个学习笔记 Polling &#xff08;轮询&#xff09;&#xff1a; 这种方式就是客户端定时向服务器发送http的Get请求&#xff0c;服务器收到请求后&#xff0c;就…

惯性张量的推理_选择合适的intel工作站处理器进行张量流推理和开发

惯性张量的推理With the increasing number of data scientists using TensorFlow, it might be a good time to discuss which workstation processor to choose from Intel’s lineup. You have several options to choose from:随着使用TensorFlow的数据科学家数量的增加&am…