机器学习模型 非线性模型
You’ve divided your data into a training, development and test set, with the correct percentage of samples in each block, and you’ve also made sure that all of these blocks (specially development and test set) come from the same distribution.
您已将数据分为训练,开发和测试集,每个模块中都有正确百分比的样本,并且还确保所有这些模块(特别是开发和测试集)都来自同一分布。
You’ve done some exploratory data analysis, gathered insights from this data, and chosen the best features for the task at hand. You’ve also chosen an evaluation metric that is well suited for your problem. Using this metric you will be able to iterate and change the hyper-parameters and configuration of your models in the quest to obtain the best possible performance.
您已经进行了一些探索性数据分析 ,从这些数据中收集了见解,并为当前任务选择了最佳功能 。 您还选择了一个非常适合您的问题的评估指标。 使用此度量,您将能够迭代和更改模型的超参数和配置,以获取最佳性能。
After all this, you pre-process the data, prepare it, and finally train a model (lets say a Support Vector Machine). You wait patiently, and once it has finished training you dispose yourself to evaluate the results, which are the following:
完成所有这些之后,您需要对数据进行预处理,准备并最终训练模型(例如说支持向量机 )。 您需要耐心等待,一旦培训结束,您就可以评估以下结果:
Training set error: 8%
训练集错误:8%
Development set error: 10%
开发设定错误:10%
How should we look at these results? What can we compare them against? How can we improve them? Is it possible to do it?
我们应该如何看待这些结果? 我们可以将它们与什么进行比较? 我们如何改善它们? 有可能做到吗?
In this post we will answer all of these questions in an easy, accessible manner. This guide is not a debugging guide about setting breakpoints in your code or seeing how training is evolving. It is about knowing what to do when your model is trained and built, how to correctly asses its performance, and seeing how you could improve it.
在这篇文章中,我们将以一种易于访问的方式回答所有这些问题。 本指南不是关于在代码中设置断点或了解培训如何发展的调试指南。 它是关于知道在训练和构建模型时该怎么做,如何正确评估其性能以及如何改进它。
Lets get to it!
让我们开始吧!
可以与我们的模型进行比较吗? (What to compare our model against?)
When we build our first model and get the initial round of results, it is always desirable to compare this model against some already existing metric, to quickly asses how well it is doing. For this, we have two main strategies: Baseline models and Human-level performance.
当我们建立第一个模型并获得初始结果时,总是希望将这个模型与一些已经存在的指标进行比较,以快速评估它的表现。 为此,我们有两个主要策略:基线模型和人员水平的绩效。
基准模型 (Baseline models)
A baseline model is a very simple model that generally yields acceptable results in some kind of task. These results, given by the baseline, are the ones you should try to improve with your new shiny machine learning model.
基线模型是一个非常简单的模型,通常可以在某种任务中产生可接受的结果。 由基线给出的这些结果是您应该尝试使用新的闪亮的机器学习模型进行改进的结果。
In a few words, a baseline is a simple approach towards solving a problem that gives a good enough result, but that should be taken as a starting point for performance. If you build a model that does not surpass baseline model performance on some data, then you should probably rethink what you are doing.
简而言之,基线是解决问题的简单方法,可以给出足够好的结果,但是应该将其作为性能的起点。 如果您建立的模型在某些数据上没有超过基准模型性能,那么您可能应该重新考虑您在做什么。
Lets see an example to get a better idea of how this works: In Natural Language Processing (NLP) one of the most common problems is that of Sentiment Analysis: detecting the mood, feeling or sentiment of a certain sentence, which could be positive, neutral or negative. A very simple model that can do this, is Naive Bayes: it is very transparent, fast on the training, and generally gives acceptable results, however, these are far from being optimal.
让我们看一个例子,以更好地了解其工作原理:在自然语言处理(NLP)中,最常见的问题之一就是情感分析:检测特定句子的情绪,感觉或情感,这可能是肯定的,中性或负面。 朴素的贝叶斯 ( Naive Bayes)是一个可以做到这一点的非常简单的模型:它非常透明,训练Swift,并且通常可以提供可接受的结果,但是,这些结果远非最佳。
Imagine, you gather some labelled data for sentiment analysis, pre-process the data, train a Naive Bayes model and get 65% accuracy. Because we are taking Naive Bayes as a Baseline model for this task, with every further model we build, we should aim to beat this 65% accuracy. If we train a Logistic Regression and get 55% performance, then we should probably re-think what we are doing.
想象一下,您收集了一些标记数据以进行情感分析 ,预处理数据,训练Naive Bayes模型并获得65%的准确性。 由于我们将朴素贝叶斯(Naive Bayes)作为该任务的基准模型,因此我们构建的每个其他模型都应力争达到65%的精度。 如果我们训练Logistic回归并获得55%的性能,那么我们可能应该重新考虑我们在做什么。
We might come to the conclusion that non-neural models are not fit for this task, train an initial Recurrent Neural Network, and get 70%. Now, as we have beaten the baseline, we can try to keep improving this RNN to get better and better performance.
我们可能得出这样的结论:非神经模型不适合此任务,训练初始的递归神经网络 ,并获得70%的收益 。 现在,我们已经突破了基准线,可以尝试不断改进此RNN,以获得越来越好的性能。
员工绩效 (Human Level Performance)
In the recent years, it has become usual for Machine Learning algorithms to not only produce excellent results in many fields, but to achieve even better results than human experts in those specific fields. Because of this, an useful metric to compare the performance of an algorithm on a certain task is Human Level Performance on that same task.
近年来,机器学习算法不仅在许多领域产生出色的结果,而且比那些特定领域的人类专家取得更好的结果已成为常态。 因此,用于比较算法在特定任务上的性能的有用度量是该任务上的人员水平性能 。
Lets see an example so that you can quickly grasp how this works. Imagine that a cardiovascular doctor can look at the health parameters of patients and diagnose with only three errors out of every one-hundred patients if the patient has a certain disease or not.
让我们看一个示例,以便您可以快速了解其工作原理。 想象一下,如果有某种疾病,心血管医生可以查看患者的健康参数,并且在每一百名患者中仅诊断出三个错误。
Now, we build a Machine learning model to look at these same parameters and diagnose the absence or presence of this previous disease. If our model makes 10 errors out of every 100 diagnoses, then there is a lot of room for improvement, (the expert makes 7 fewer errors for every 100 patients; he has a 7% lower error rate), however, if our model makes 1 failed prediction out 100, it is surpassing human level performance, and therefore doing quite well.
现在,我们建立一个机器学习模型来查看这些相同的参数,并诊断出先前疾病的存在与否。 如果我们的模型每100次诊断中就有10个错误,那么还有很大的改进余地(专家每100名患者减少7个错误;错误率降低7%),但是,如果我们的模型能够1项失败的预测中有100项超出了人类水平的表现,因此做得还不错。
Human level performance: 3% error
人员水平绩效:3%的错误
Model test data performance: 10% error
模型测试数据性能:10%的误差
Alright, now that we have understood these two metrics, lets progress in the analysis of the results of our Machine Learning models taking Human-level performance as the metric to compare against.
好了,既然我们已经理解了这两个指标,就可以在以人为水平的绩效为指标进行比较的机器学习模型的结果分析中取得进展。
与人类水平的表现比较 (Comparing to Human level performance)
Understanding how humans perform in a task can guide us towards how to reduce bias and variance. If you don’t know what Bias or Variance are, you can learn about it on the following post:
了解人类在一项任务中的表现可以指导我们如何减少偏见和差异。 如果您不知道什么是“偏差”或“方差”,则可以在以下文章中了解它:
Despite humans being awesome at certain tasks, as we have said, Machines can become even better than them, and surpass human level performance. However, there is a certain threshold that neither humans or Machine learning models can surpass: Bayes Optimal error.
正如我们已经说过的,尽管人类在某些任务上表现出色,但是机器可以变得比它们更好,并且可以超越人类水平的性能。 但是,人类或机器学习模型都无法超越某个特定的阈值: 贝叶斯最优误差。
Bayes optimal error is the best theoretical result that can be obtained for a certain task, and can not be improved by any kind of function, natural or artificial.
贝叶斯最佳误差是可以针对某项任务获得的最佳理论结果,不能通过任何自然或人为的功能来改善。
Imagine a data set composed of images of traffic lights where some images have an orientation such, and are so blurry that it is impossible, even for humans to get all the correct light colours from these images.
想象一下一个由交通信号灯图像组成的数据集,其中某些图像具有这样的方向,并且非常模糊,以至于即使人类也无法从这些图像中获得所有正确的灯光颜色。
For this data set, Bayes optimal performance would be the maximum number of images that we can actually correctly classify, as some of them are impossible both for humans and machines.
对于此数据集,贝叶斯的最佳性能将是我们实际上可以正确分类的最大图像数,因为其中某些图像对于人和机器都是不可能的。
For many tasks human level performance is close to Bayes optimal error, so we tend to use human level performance as a proxy or approximation of Bayes optimal error.
对于许多任务,人类水平的性能接近贝叶斯最佳误差,因此我们倾向于使用人类水平的性能作为贝叶斯最佳误差的近似值或近似值 。
Lets see a more concrete example, with numbers, to get a complete grasp of the relationship between Human level performance, Bayes Optimal error, and the results of our models.
让我们看一个带有数字的更具体的例子,以全面了解人的水平表现,贝叶斯最佳误差与模型结果之间的关系。
了解人员水平的表现和贝叶斯最佳误差 (Understanding Human level performance and Bayes Optimal error)
Imagine a medical image diagnosis task, where a typical doctor achieves a 1% error. Because of this, if we take Human level performance as a proxy for Bayes, we can say that Bayes error is lower or equal to 1%.
想象一下医学图像诊断任务,典型的医生在其中完成了1%的错误。 因此,如果我们将人的性能作为贝叶斯的代理,那么我们可以说贝叶斯误差低于或等于1%。
It is important to note that Human level performance has to be defined depending on the context in which the Machine Learning system is going to be deployed.
重要的是要注意,必须根据将要部署机器学习系统的上下文来定义人员级别的性能。
Imagine now that we build a Machine learning model and get the following results on this diagnosis task:
现在想象一下,我们建立了机器学习模型,并在此诊断任务中获得了以下结果:
Training set error: 7%
训练集误差:7%
Test set error: 8%
测试设定错误:8%
Now, if our Human level performance (proxy for Bayes error) is 1%, what do you think we should focus on improving? The error difference between Bayes Optimal error (1%) and our training set error (7%) or the error difference between training and test set error? We will call the first of these two differences Avoidable bias (between human and training set error) and the second one Variance (between train and test errors).
现在,如果我们的人员水平表现(贝叶斯误差的代理人)为1%, 您认为我们应该着重改进什么? 贝叶斯最优误差(1%)与训练集误差(7%)之间的误差差还是训练与测试集误差之间的误差差? 我们将这两个差异中的第一个称为可避免的偏差 (在人为和训练设置误差之间),第二个差异 (在训练和测试误差之间)。
Once we know where to optimise, how should we do it? Keep reading to find out!
一旦我们知道在哪里进行优化,我们应该如何做? 继续阅读以找出答案!
在哪里以及如何改善我们的机器学习模型 (Where and how to improve our Machine Learning models)
Depending on these sizes of these two error differences (avoidable bias and variance) there are different strategies which we ought to apply in order to reduce these errors and get the best possible results out of our models.
根据这两个误差差异的大小(可避免的偏差和方差),我们应采用不同的策略以减少这些误差并从模型中获得最佳结果。
In the previous example, the difference between human level performance and training set error (6%) is a lot bigger than the difference between training and test set error (1%), so we will focus on reducing the avoidable bias. If training set error was 2% however, then the bias would be 1%, and the variance would be 6% and we would focus on reducing variance.
在前面的示例中,人的水平性能和训练设置误差之间的差异(6%)比训练和测试设置误差之间的差异(1%)大得多,因此我们将重点放在减少可避免的偏差上。 但是,如果训练集误差为2%,则偏差将为1%,方差将为6%,我们将集中精力减少方差。
If bias and variance were very similar, and there was room for improving both, then we would have to see which is least expensive or easier to reduce.
如果偏差和方差非常相似,并且都有改进的余地,那么我们将不得不看到哪一种最便宜或更容易减少。
Lastly, if human level performance, training, and test error where all similar and acceptable, we would leave our awesome model just as it is.
最后,如果人员水平的性能,培训和测试错误都相似且可以接受,我们将保留我们的出色模型。
How do we reduce each of these gaps? Lets take a look first and how to reduce avoidable bias.
我们如何缩小这些差距? 让我们先来看一下如何减少可避免的偏差。
改善模型性能:如何减少可避免的偏差。 (Improving model performance: how to reduce Avoidable Bias.)
In our search for the best possible Machine learning model, we must look to fit the training set really well without producing over-fitting.
在寻找最佳的机器学习模型时,我们必须看起来非常适合训练集,而不会产生过度拟合。
We will look at how to quantify and reduce this over-fitting in just a bit, but the first thing we have to try to achieve is an acceptable performance on our training set, making the gap between human level performance or Bayes error, and training set error as small as possible.
我们将稍稍研究一下如何量化和减少这种过度拟合,但是我们要做的第一件事就是在我们的训练集上获得可接受的性能,从而在人为水平的性能或贝叶斯误差与培训之间形成差距将误差设置得尽可能小。
For this there are various strategies we can adopt:
为此,我们可以采用多种策略:
If we trained a classic Machine learning model, like a Decision Tree, a Linear or Logistic Regressor, we could try to train something more complex like an SVM, or a Boosting model.
如果我们训练了经典的机器学习模型 ,例如决策树,线性或逻辑回归,则可以尝试训练更复杂的东西,例如SVM或Boosting模型 。
If after this we are still getting poor results, maybe our task needs a more complex or specific architecture, like a Recurrent or Convolutional Neural Network.
如果在此之后我们仍然获得较差的结果,则也许我们的任务需要更复杂或更具体的体系结构 ,例如递归或卷积神经网络。
- When Artificial Neural Networks still don’t cut it enough, we can train these networks longer, make them deeper or change the optimisation algorithms. 当人工神经网络仍然不能满足需要时,我们可以训练这些网络更长的时间,使其更深,或者改变优化算法。
After all this, if there is still a lot of room for improvement, we could try to get more labelled data by humans, to see if there is some sort of issue with our initial data set.
毕竟,如果仍然有很多改进的余地,我们可以尝试由人类获取更多的标签数据 ,以查看我们的初始数据集是否存在某种问题。
Lastly, we can carry out manual error analysis: seeing specific examples where our algorithm is performing badly. Going back to an image classification example, maybe through this analysis we can see that small dogs are getting classified as cats, and we can fix this by getting more labelled images of small dogs. In our traffic light example we could spot the issue with blurry images and set a pre-processing step for the data set to discard any images that don’t meet a certain quality threshold.
最后,我们可以进行手动错误分析 :查看我们的算法表现不佳的特定示例。 回到图像分类的例子,也许通过分析,我们可以看到小狗被分类为猫,并且我们可以通过获取更多带标签的小狗图像来解决此问题。 在我们的交通信号灯示例中,我们可以发现图像模糊的问题,并为数据集设置预处理步骤,以丢弃任何不符合特定质量阈值的图像。
By using these tactics we can make avoidable bias become increasingly low. Now that we know how to do this, lets take a look at how to reduce Variance.
通过使用这些策略,我们可以使可避免的偏差变得越来越小。 现在我们知道如何做到这一点,让我们看一下如何减少方差。
改善模型性能:如何减少差异。 (Improving model performance: how to reduce Variance.)
When our model has high variance, we say that it is over-fitting: it adapts too well to the training data, but generalises badly to data it has not seen before. To reduce this variance, there are various strategies that we can adopt, which differ mostly from the ones we just saw to reduce bias. These strategies are:
当我们的模型具有很高的方差时,我们说它是过度拟合的:它对训练数据的适应性太强,但是对以前从未见过的数据的归纳性很差。 为了减少这种差异,我们可以采用多种策略,这些策略与我们刚刚看到的减少偏差的策略大部分不同。 这些策略是:
Get more labelled data: if our model ins’t generalising well in some cases, maybe it is because it has never seen those kind of data instances in the training, and therefore getting more training data could be of great use for model improvement.
获取更多带标签的数据:如果我们的模型在某些情况下不能很好地推广,也许是因为它在训练中从未见过此类数据实例,因此获取更多训练数据对于模型改进很有用。
Trying data augmentation: if getting more data is not possible, then we could try data augmentation techniques. With images this is a pretty standard procedure, done by rotating, cropping, RGB shifting and other similar strategies.
尝试数据增强:如果无法获得更多数据,则可以尝试数据增强技术。 对于图像,这是一个非常标准的过程,通过旋转,裁剪,RGB移位和其他类似策略来完成。
Use regularisation: there are techniques that are specifically conceived for reducing over-fitting, like L1 and L2 regularisation, or Dropout in the case of Artificial Neural networks.
使用正则化:存在专门为减少过度拟合而设计的技术,例如L1和L2正则化,或者在人工神经网络的情况下为Dropout。
After this, we would have also managed to reduce our variance! Awesome, now we have optimised our Machine Learning model to its full potential.
在此之后,我们也将设法减少差异! 太棒了,现在我们已经优化了机器学习模型的全部潜能。
结论和其他资源 (Conclusion and additional Resources)
That is it! As always, I hope you enjoyed the post, and that I managed to help you understand how to debug and improve the performance of your Machine learning models.
这就对了! 与往常一样,希望您喜欢这篇文章 ,并且我设法帮助您了解如何调试和改善机器学习模型的性能。
If you liked this post then feel free to follow me on Twitter at @jaimezorno. Also, you can take a look at my other posts on Data Science and Machine Learning here. Have a good read!
如果您喜欢这篇文章,请随时 通过@jaimezorno 在 Twitter上 关注我 。 另外,您可以在 此处 查看我在数据科学和机器学习方面的其他帖子 。 祝您阅读愉快!
If you want to learn more about Machine Learning and Artificial Intelligence follow me on Medium, and stay tuned for my next posts! Also, you can check out this repository for more resources on Machine Learning and AI!
如果您想了解有关机器学习和人工智能的更多信息,请 在Medium上关注我 ,并继续关注我的下一篇文章! 另外,您可以在 此存储库中 查看有关机器学习和AI的更多资源!
翻译自: https://towardsdatascience.com/the-ultimate-guide-to-debugging-your-machine-learning-models-103dc0f9e421
机器学习模型 非线性模型
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/390553.shtml
如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!