机器学习 训练验证测试_测试前验证| 机器学习

机器学习 训练验证测试

In my previous article, we have discussed about the need to train and test our model and we wrote a code to split the given data into training and test sets.

在上一篇文章中,我们讨论了训练和测试模型的必要性,并编写了代码将给定的数据分为训练和测试集。

Before moving to the validation portion, we need to see what is the need to use validation procedure before performing the testing procedure in the given data set. At times when we are dealing with a huge amount of data there is a certain chance that maybe the data used by our model during learning produced a biased result and in this case as we use the test set to check the accuracy of our model the following 2 cases can arise:

在转到验证部分之前,我们需要了解在给定数据集中执行测试过程之前,需要使用验证过程进行哪些操作。 有时,当我们处理大量数据时,很有可能我们的模型在学习过程中使用的数据会产生有偏差的结果,在这种情况下,由于我们使用测试集来检查模型的准确性,因此以下可能出现2种情况:

  1. Under fitting of the test data

    测试数据拟合

  2. Over fitting of the test data

    测试数据过度拟合

Over and Under fitting of the test data

Image source: https://docs.aws.amazon.com/machine-learning/latest/dg/images/mlconcepts_image5.png

图片来源: https : //docs.aws.amazon.com/machine-learning/latest/dg/images/mlconcepts_image5.png

So then how do we deal with such a problem? Well, the answer is pretty simple if we can somehow use a 3rd data set to validate the results obtained from the training set so that we can adjust the various hyperparameters like learning rate and batch values to get a balanced result on the validation set which will, in turn, increase the accuracy of our model in estimating the target values from the test set.

那么,我们该如何处理这个问题呢? 那么,答案很简单,如果我们能够以某种方式使用三档数据集来验证训练组所取得的成果,使我们可以调整各种超参数就像学率和批量值来得到验证集一个平衡的结果,其反过来,将提高我们的模型从测试集中估算目标值的准确性。

Over and Under fitting of the test data

Image source: https://rpubs.com/charlydethibault/348566

图片来源: https : //rpubs.com/charlydethibault/348566

Here, you can see that the validation set is nothing but a subset of the training data set that we create. Here do remember that when we create a partition from a dataset. The data present in the datasets are shuffled randomly to remove biased results.

在这里,您可以看到验证集不过是我们创建的训练数据集的子集。 这里要记住,当我们根据数据集创建分区时。 数据集中存在的数据会随机洗牌以消除有偏见的结果。

So, let us write a simple code to create a validation data set in python:

因此,让我们编写一个简单的代码来在python中创建一个验证数据集:

File: headbrain.CSV

文件: headbrain.CSV

Here is the code:

这是代码:

# -*- coding: utf-8 -*-
"""
Created on Wed Aug  1 22:18:11 2018
@author: Raunak Goswami
"""
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#reading the data
"""here the directory of my code and the headbrain.csv 
file is same make sure both the files are stored in the same folder
or directory""" 
data=pd.read_csv('headbrain.csv')
#this will show the first five records of the whole data
data.head()
#this will create a variable x which has the feature values i.e brain weight
x=data.iloc[:,2:3].values 
#this will create a variable y which has the target value i.e brain weight
y=data.iloc[:,3:4].values 
#splitting the data into training and test
"""
the following statement written below will split x and y into 2 parts:
1.training variables named x_train and y_train
2.test variables named x_test and y_test
The splitting will be done in the ratio of 1:4 as we have mentioned 
the test_size as 1/4 of the total size
"""
from sklearn.cross_validation import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=1/4,random_state=0)
#Here we again split the training data further 
##into training and validating sets.
#observe that the size of the validating set is 
#1/4 of the training set and not of the whole dataset
from sklearn.cross_validation import train_test_split
x_training,x_validate,y_training,y_validate=train_test_split(x_train,y_train,test_size=1/4,random_state=0)

After running this python code on your Spyder tool provided by the Anaconda distribution just cross check your variable explorer:

在Anaconda发行版提供的Spyder工具上运行此python代码后,只需交叉检查变量浏览器即可:

Variable explorer

On the image above you can see that we have split the train variables into training variables and validate variables.

在上图中,您可以看到我们已将训练变量分为训练变量并验证了变量。

So, guys that is it for today hope you liked this article. Have a great day ahead.

所以,今天的家伙们希望您喜欢这篇文章。 祝您有美好的一天。

翻译自: https://www.includehelp.com/ml-ai/validation-before-testing.aspx

机器学习 训练验证测试

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/544231.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

如何判断线程池已经执行完所有任务了?

作者 | 磊哥来源 | Java面试真题解析(ID:aimianshi666)转载请联系授权(微信ID:GG_Stone)很多场景下,我们需要等待线程池的所有任务都执行完,然后再进行下一步操作。对于线程 Thread …

IRCTC的完整形式是什么?

IRCTC:印度铁路餐饮和旅游公司 (IRCTC: Indian Railways Catering and Tourism Corporation) IRCTC is an abbreviation of Indian Railways Catering and Tourism Corporation. It is a subsidiary of the Indian Railway established by the Ministry of Railways…

分布式锁的 3 种实现方案!

前言 大家好,我是磊哥。今天跟大家探讨一下分布式锁的设计与实现。希望对大家有帮助,如果有不正确的地方,欢迎指出,一起学习,一起进步哈~分布式锁概述数据库分布式锁Redis分布式锁Zookeeper分布式锁三种分布式锁对比1.…

java学习笔记16--异常

java学习笔记16--异常 异常 异常时导致程序中断运行的一种指令流,如果不对异常进行正确的处理,则可能导致程序的中断执行,造成不必要的损失, 所以在程序的设计中必须要考虑各种异常的发生,并正确的做好相应的处理&am…

ruby hash添加数据_如何在Ruby中向Hash添加元素?

ruby hash添加数据Before going through the ways to add elements to the hash instances, let us understand what could be called as a hash element. So, Hash is the collection of keys and their values. For example, 在介绍向哈希实例添加元素的方法之前,…

线程安全问题的 3 种解决方案!

作者 | 磊哥来源 | Java面试真题解析(ID:aimianshi666)转载请联系授权(微信ID:GG_Stone)线程安全是指某个方法或某段代码,在多线程中能够正确的执行,不会出现数据不一致或数据污染的…

黑色30s高并发IIS设置

在这篇博文中,我们抛开对阿里云的怀疑,完全从ASP.NET的角度进行分析,看能不能找到针对问题现象的更合理的解释。 “黑色30秒”问题现象的主要特征是:排队的请求(Requests Queued)突增,到达HTTP.…

我们可以覆盖Java中的main()方法吗?

The question is that "Can we override main() method in Java?" 问题是“我们可以覆盖Java中的main()方法吗?” No, we cant override the main() method in java. 不,我们不能覆盖java中的main()方法 。 First, we will understand what …

一文读懂MySQL查询语句的执行过程

需要从数据库检索某些符合要求的数据,我们很容易写出 Select A B C FROM T WHERE ID XX 这样的SQL,那么当我们向数据库发送这样一个请求时,数据库到底做了什么?我们今天以MYSQL为例,揭示一下MySQL数据库的查询过程&a…

angularJS的$http.post请求,.net后台接收不到参数值的解决方案

JS通用部分var shoppingCartModule angular.module(starter, [ionic], function ($httpProvider) {// Use x-www-form-urlencoded Content-Type$httpProvider.defaults.headers.post[Content-Type] application/x-www-form-urlencoded;charsetutf-8;/*** The workhorse; conve…

带有示例的Python列表reverse()方法

列出reverse()方法 (List reverse() Method) reverse() method is used to reverse the elements of the list, the method is called with this list (list in which we have to reverse the elements) and it reverses all elements in the list. reverse()方法用于反转列表中…

复杂度O(n)倒转链表

1 public class ListNode {2 int val;3 ListNode next;4 ListNode(int x) { val x; }5 ListNode(){}6 7 public static ListNode revese(ListNode input)8 {9 ListNode head new ListNode();//头插法的头 10 ListNode cur in…

synchronized底层是如何实现的?

作者 | 磊哥来源 | Java面试真题解析(ID:aimianshi666)转载请联系授权(微信ID:GG_Stone)想了解 synchronized 是如何运行的?就要先搞清楚 synchronized 是如何实现?synchronized 同步…

java sublist_Java Vector subList()方法与示例

java sublist向量类subList()方法 (Vector Class subList() method) subList() method is available in java.util package. subList()方法在java.util包中可用。 subList() method is used to return a set of sublist [it returns all those elements exists in a given rang…

单例模式 4 种经典实现方法

0.前言 如果你去问一个写过几年代码的程序员用过哪些设计模式,我打赌,90%以上的回答里面会带【单例模式】。甚至有的面试官会直接问:说一下你用过哪些设计模式,单例就不用说了。你看,连面试官都听烦了,火爆…

CSRF简单介绍及利用方法-跨站请求伪造

0x00 简要介绍 CSRF(Cross-site request forgery)跨站请求伪造,由于目标站无token/referer限制,导致攻击者可以用户的身份完成操作达到各种目的。根据HTTP请求方式,CSRF利用方式可分为两种。 0x01 GET类型的CSRF 这种类…

java setsize_Java Vector setSize()方法与示例

java setsize向量类setSize()方法 (Vector Class setSize() method) setSize() method is available in java.util package. setSize()方法在java.util包中可用。 setSize() method is used to set the new size of this vector and when new size (n_size) > current size …

虾皮二面:什么是零拷贝?如何实现零拷贝?

前言 零拷贝是老生常谈的问题啦,大厂非常喜欢问。比如Kafka为什么快,RocketMQ为什么快等,都涉及到零拷贝知识点。最近技术讨论群几个伙伴分享了阿里、虾皮的面试真题,也都涉及到零拷贝。因此本文将跟大家一起来学习零拷贝原理。1.…

设计模式2:工程模式(1)

什么是工厂模式? 提供一个创建一系列或相互依赖对象的接口,而不需指定它们具体的类。 通俗的讲就是定义了多个产品的类,且只有一个工厂类,而这个工厂类根据需求的不同,可以产生不同产品类的对象。 作用:主要为创建对象提供过度接…

java indexof_Java Vector indexOf()方法与示例

java indexof向量类indexOf()方法 (Vector Class indexOf() method) Syntax: 句法: public int indexOf(Object ob);public int indexOf(Object ob, int indices);indexOf() method is available in java.util package. indexOf()方法在java.util包中可用。 indexO…