深度学习 推理 训练
背景与挑战📋 (Background and challenges 📋)
In a modern deep learning algorithm, the dependence on manual annotation of unlabeled data is one of the major limitations. To train a good model, usually, we have to prepare a vast amount of labeled data. In the case of a small number of classes and data, we can use the pre-trained model from the labeled public dataset and fine-tune a few last layers with your data. However, in real life, it’s easily faced with the problem when your data is considerably large (the products in the store or the face of a human,..) and it will be difficult for the model to learn with just a few trainable layers. Furthermore, the amount of unlabeled data (e.g. document text, images on the Internet) is uncountable. Labeling all of them for the task is almost impossible but not utilizing them is definitely a waste.
在现代深度学习算法中,对未标记数据的手动注释的依赖是主要限制之一。 为了训练一个好的模型,通常,我们必须准备大量的标记数据。 对于少量的类和数据,我们可以使用带有标签的公共数据集中的预训练模型,并使用您的数据微调最后几层。 但是,在现实生活中,当您的数据相当大时(存储在商店中的产品或人的脸,..),很容易遇到问题,并且模型很难通过几个可训练的层来学习。 此外,未标记数据(例如,文档文本,Internet上的图像)的数量是不可数的。 为任务标记所有标签几乎是不可能的,但是不使用它们绝对是浪费。
In this case, training a deep model again from scratch with a new dataset will be an option but it takes a lot of time and effort for labeling data while using a pre-trained deep model seems no longer helpful. That is the reason why Self-supervised learning was born. The idea behind this is simple, which serves two main tasks:
在这种情况下,使用新数据集从头开始重新训练深度模型将是一种选择,但使用预先训练的深度模型似乎不再有用,但是花费大量时间和精力来标记数据。 这就是自我监督学习诞生的原因。 这背后的想法很简单,它有两个主要任务:
Surrogate task: the deep model will learn generalizable representations from unlabeled data without annotation, and then will be able to self-generate a supervisory signal exploiting implicit information.
替代任务:深度模型将从未标记的数据中学习可概括的表示形式,而无需注释,然后将能够利用隐式信息自行生成监控信号。
Downstream task: representations will be fine-tuned for supervised-learning tasks e.g. classification and image retrieval with less number of labeled data (the number of labeled data depending on the performance of model based on your requirement)
下游任务:将对有监督学习任务的表示进行微调 例如,分类和图像检索使用较少的标记数据(标记数据的数量取决于您的需求取决于模型的性能)
There are much different training approaches proposed to learn such representations: Relative position [1]: the model needs to understand the spatial context of objects to tell the relative position between parts; Jigsaw puzzle [2]: the model needs to place 9 shuffled patches back to the original locations; Colorization [3]: the model has trained to color a grayscale input image; precisely the task is to map this image to a distribution over quantized color value outputs; Counting features [4]: The model learns a feature encoder using feature counting relationship of input images transforming by Scaling and Tiling; SimCLR [5]: The model learns representations for visual inputs by maximizing agreement between differently augmented views of the same sample via a contrastive loss in the latent space.
提出了许多不同的训练方法来学习此类表示形式: 相对位置[1]: 模型需要了解对象的空间背景,以告诉零件之间的相对位置; 拼图游戏[2]:模型需要将9个打乱的补丁放回到原始位置; 着色[3]:模型已训练为对灰度输入图像进行着色; 确切的任务是将该图像映射到量化的颜色值输出上的分布; 计数特征[4]:模型使用通过缩放转换的输入图像的特征计数关系学习特征编码器 和 平铺; SimCLR [5]:模型通过潜在空间中的对比损失来最大化同一样本的不同增强视图之间的一致性,从而学习视觉输入的表示形式。
However, I would like to introduce one interesting approach that is able to recognize things like a human. The key factor in human learning is the acquisition of new knowledge by comparing relating and different entities. So, it is a nontrivial solution if we can apply a similar mechanism in self-supervised machine learning via the Relational reasoning approach [6].
但是,我想介绍一种有趣的方法,它能够识别像人类一样的东西。 人类学习的关键因素是通过比较相关实体和不同实体来获取新知识。 因此,如果我们可以通过关系推理方法在自我监督的机器学习中应用类似的机制,那将是一个不平凡的解决方案[6]。
The relational reasoning paradigm is based on a key design principle: the use of a relation network as a learnable function on the unlabeled dataset to quantify the relationships between views of the same object (intra-reasoning) and relationships between different objects in different scenes (inter-reasoning). The possibility to exploit a similar mechanism in self-supervised machine learning via relational reasoning was evaluated by the performance on standard datasets (CIFAR-10, CIFAR-100, CIFAR-100–20, STL-10, tiny-ImageNet, SlimageNet), learning schedule, and backbones (bothshallow and deep). The results show that the Relational reasoning approach largely outperforms the best competitor in all conditions by an average 14% accuracy and the most recent state-of-the-art method by 3% indicating in this paper [6].
关系推理范例基于一个关键设计原则:使用关系网络作为未标记数据集上的可学习功能,以量化同一对象的视图之间的关系(内部推理)和不同场景中不同对象之间的关系(相互解释)。 通过标准数据集(CIFAR-10,CIFAR-100,CIFAR-100–20,STL-10,tiny-ImageNet,SlimageNet)的性能评估了通过关系推理在自我监督机器学习中采用类似机制的可能性,学习时间表和骨干(浅层和深层)。 结果表明, 关系推理方法在所有条件下的表现均优于最佳竞争者,平均准确度为14%,而最新技术水平的表现为3%,表明了本文的研究 [6]。
技术亮点📄 (Technique highlight 📄)
For the simplest explanation, Relational Reasoning is just a methodology that tries to help learners understanding relations between different objects (ideas) rather than learning objects individually. That could help learners easily distinguish and remember the object based on their difference. There are two main components in the Relational reasoning system [6]: backbone structure and relation head. The relation head was used in the pretext task phase for supporting the underlying neural network backbone learning useful representations in the unlabeled dataset and then it will be discarded. The backbone structure was used in downstream tasks such as classification or image retrieval after training in the pretext task.
对于最简单的解释, 关系推理只是一种试图帮助学习者理解不同对象(思想)之间关系的方法,而不是单独学习对象。 这可以帮助学习者根据他们的差异轻松区分和记住对象。 关系推理系统[6]有两个主要组成部分:主干 结构和关系头 。 关系头在前置任务阶段中用于支持基础神经网络主干学习未标记数据集中的有用表示形式,然后将其丢弃。 在对前言任务进行培训之后,将主干结构用于下游任务,例如分类或图像检索。
Previous work: focus on within-scene relation, meaning that
先前的工作:专注于场景内关系,这意味着
all the elements in the same object belong to the same scene (e.g. balls from a basket); training on label dataset and the main goal is the relation head [7].
同一对象中的所有元素都属于同一场景(例如,篮子中的球); 在标签数据集上进行训练,主要目标是关系头[7]。
New approach: focus on relations between different views of the same object (intra-reasoning) and between different objects in different scenes (inter-reasoning); use relational reasoning on unlabeled data and the relation head is a pretext task for learning useful representations in the underlying backbone.
新方法:关注同一对象的不同视图之间的关系( 推理 )和不同场景中的不同对象之间的关系( 推理 ); 在未标记的数据上使用关系推理,并且关系头是用于学习基础骨干中有用表示的一个前置任务。
Let’s discuss the important point in some part of the Relational reasoning system:
让我们讨论关系推理系统某些部分的重点:
Mini-batch augmentation:
小批量增强:
As mentioned before, this system introduced intra-reasoning and inter-reasoning? So why we need them? It is not possible to create pairs of similar and dissimilar objects when labels are not given. To solve this problem, the bootstrapping technique was applied and resulted in forming intra-reasoning and inter-reasoning, where:
如前所述,该系统引入了内部推理和内部推理 ? 那么为什么我们需要它们? 如果未指定标签,则无法创建相似和不相似的对象对。 为了解决这个问题,应用了引导技术,并导致形成内部推理和内部推理,其中:
Intra-reasoning consists of sampling random augmentations of the same object {A1; A2 } (positive pair) (eg. different views of same basketball)
内部推理由对同一对象{A1; A2}(正对)(例如,同一篮球的不同观点)
Inter-reasoning consists of coupling two random objects {A1; B1} (negative pair) (eg. basketball with random ball)
相互推理包括耦合两个随机对象{A1; B1}(负对)(例如带随机球的篮球)
Furthermore, the utilization of the random augmentations functions (e.g. geometric transformation, color distortion) is also considered to make between-scenes reasoning more complicated. The benefit of these augmentations functions forces the learner (backbone) to pay attention to the correlation between a wider set of features (e.g. color, size, texture, etc.). For instance, in the pair {foot ball, basket ball}, the color alone is a strong predictor of the class. However, with the random changing of color as well as the shape size, the learner now is difficult to discriminate the difference between this pair. The learner has to take a look at another feature, consequently, it results in better representation.
此外,还考虑利用随机增强函数(例如几何变换,颜色失真)使场景间推理更加复杂。 这些增强功能的好处迫使学习者(骨干)注意更广泛的功能(例如颜色,大小,纹理等)之间的相关性。 例如,在{脚球,篮子球}对中,单独的颜色是该类别的有力预测指标。 然而,随着颜色和形状大小的随机变化,学习者现在难以区分这对之间的差异。 学习者必须看一下另一个功能,因此,它可以提供更好的表示。
2. Metric learning
2.公制学习
The aim of metric learning s to use a distance metric to bring closer representations of similar inputs (positives) while moving away representations of dissimilar inputs (negatives). However, in Relational reasoning, metric learning is fundamentally different:
度量学习的目的是使用距离度量在不相似输入(负)的表示移开的同时,更接近相似输入(正)的表示。 但是,在关系推理中,度量学习本质上是不同的:
3. Loss function
3.损失功能
The learning objective is a binary classification problem over the presentation pairs. Therefore we can use a binary cross-entropy loss to the maximization of a Bernoulli log-likelihood, where the relation score y represents a probabilistic estimate of representation membership inducing through a sigmoid activation function.
学习目标是表示对上的二进制分类问题。 因此,我们可以使用二元交叉熵损失来最大化伯努利对数似然,其中关系得分y表示通过S型激活函数引起的表示成员资格的概率估计。
Finally, this paper [6] also supplied the result of Relational reasoning on standard datasets (CIFAR-10, CIFAR-100, CIFAR-100–20, STL-10, tiny-ImageNet, SlimageNet), different backbones (shallow and deep), same learning schedule (epochs). The results are below, for further information you can check out his paper.
最后,本文[6]还提供了标准数据集(CIFAR-10,CIFAR-100,CIFAR-100-20,STL-10,tiny-ImageNet,SlimageNet),不同主干(浅层和深层)的关系推理结果。 ,相同的学习时间表(时期)。 结果如下,有关更多信息,请查看他的论文。
实验评估📊 (Experimental evaluation 📊)
In this article, I want to reproduce the Relational reasoning system on the public image dataset STL-10. This dataset comprises of 10 classes (airplane, bird, automobile, cat, deer, dog, horse, monkey, ship, truck) with 96x96 pixels color.
在本文中,我想在公共图像数据集STL-10上重现关系推理系统 。 该数据集包含10个类别(飞机,鸟类,汽车,猫,鹿,狗,马,猴子,船,卡车),颜色为96x96像素。
First of all, we need to import some important library
首先,我们需要导入一些重要的库
STL-10 dataset consists of 1300 labeled images (500 for training and 800 for testing). However, it also includes 100000 unlabeled images from a similar but broader distribution of images. For instance, it contains other types of animals (bears, rabbits, etc.) and vehicles (trains, buses, etc.) in addition to the ones in the labeled set
STL-10数据集包含1300个带标签的图像(用于训练的500个图像和用于测试的800个图像)。 但是,它也包括来自相似但分布较广的图像的100000张未标记图像。 例如,除了标记集中的动物外,它还包含其他类型的动物(熊,兔子等)和车辆(火车,公共汽车等)
And then we will create the Relational reasoning class based on the suggestion of the author
然后根据作者的建议创建关系推理类
To compare the performance of Relational reasoning methodology on the shallow and deep model, we will create a shallow model (Conv4) and use the structure of a deep model (Resnet34).
为了比较关系推理方法在浅层模型和深层模型上的性能,我们将创建一个浅层模型(Conv4),并使用深层模型(Resnet34)的结构。
backbone = Conv4() # shallow model
backbone = models.resnet34(pretrained = False) # deep model
Some hyperparameters and augmentation strategies were set based on the suggestion of the author. We will train our backbone with relation head on unlabeled STL-10 dataset.
根据作者的建议设置了一些超参数和扩充策略。 我们将在未标记的STL-10数据集中训练带有关系头的主干。
Up to now, we’ve already created everything necessary to train our model. Now we will train the backbone and relation head model in 10 epochs and 16 augmentation images (K), it took 4 hours with the shallow model (Conv4) and 6 hours on the deep model (Resnet34) by 1 GPU Tesla P100-PCIE-16GB (you can freely change the number of epochs as well as another hyperparameter to obtain better results)
到目前为止,我们已经创建了训练模型所需的一切。 现在,我们将在10个时期和16个增强图像(K)中训练主干和关系头模型,其中1个GPU Tesla P100-PCIE-在浅层模型(Conv4)和深层模型(Resnet34)上花费了4个小时, 16GB(您可以自由更改时期数以及另一个超参数以获得更好的结果)
device = torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu")backbone.to(device)
model = RelationalReasoning(backbone, feature_size)
model.train(tot_epochs=tot_epochs, train_loader=train_loader)
torch.save(model.backbone.state_dict(), 'model.tar')
After training our backbone model, we discard the relation head and use only the backbone for the downstream tasks. We need to fine-tune our backbone with labeled data in STL-10 (500 images) and test the final model in the test set (800 images). Training and testing datasets will load in Dataloader without augmentations.
训练完主干模型后,我们将丢弃关系头,而仅将主干用于下游任务。 我们需要使用STL-10中的标记数据(500张图像)对主干进行微调,并在测试集中测试最终模型(800张图像)。 训练和测试数据集将在不进行增强的情况下加载到Dataloader中。
We will load the pretrained backbone model and use a simple linear model to connect the output feature with a number of classes in the dataset.
我们将加载预训练的骨干模型,并使用简单的线性模型将输出要素与数据集中的多个类连接。
# linear model
linear_layer = torch.nn.Linear(64, 10) # if backbone is Conv4
linear_layer = torch.nn.Linear(1000, 10) # if backbone is Resnet34# defining a raw backbone model
backbone_lineval = Conv4() # Conv4backbone_lineval = models.resnet34(pretrained = False) # Resnet34# load model
checkpoint = torch.load('model.tar') # name of pretrain weight
backbone_lineval.load_state_dict(checkpoint)
In this time, only the linear model will be trained, the backbone model will be frozen. First, we will see the result of fine-tuned Conv4
此时,仅线性模型将被训练,主干模型将被冻结。 首先,我们将看到经过微调的Conv4的结果
And then check on the test set
然后检查测试集
Conv4 obtained 49.98% accuracy on the test set, it means that the backbone model could learn useful feature in the unlabeled dataset, we just need to fine-tune with few epochs to achieve a good result. Now let check the performance of the deep model.
Conv4在测试集上获得了49.98%的准确性,这意味着主干模型可以在未标记的数据集中学习有用的功能,我们只需要经过几个时期就可以进行微调以取得良好的结果。 现在让我们检查深度模型的性能。
Then evaluating on the test dataset
然后评估测试数据集
It’s much better, we can obtain 55.38% accuracy on the test set. In this article, the main goal is to reproduce and evaluate the Relational reasoning methodology to teach the model distinguishing the object without the label, therefore, these results were very promising. If you feel unsatisfied, you can freely do the experiment by changing the hyperparameter such as the number of augmentation, epochs, or model structure.
更好,我们可以在测试集上获得55.38%的精度。 在本文中,主要目标是重现和评估关系推理方法,以讲授区分没有标签的对象的模型,因此,这些结果非常有希望。 如果您不满意,则可以通过更改超参数(例如扩充的数量,历元或模型结构)自由地进行实验。
最后的想法📕 (Final Thoughts 📕)
Self-supervised relational reasoning is effective in both a quantitative and qualitative manners, and with backbones of different size from shallow to deep structure. Representations learned through comparison can be easily transferred from one domain to another, they are fine-grained and compact, which may be due to the correlation between accuracy and number of augmentations. In relational reasoning, the number of augmentations has a primary role affecting the quality of the clusters of objects based on the author’s experiment [4]. Self-supervised learning has a strong potential to become the future of machine learning in many aspects.
自我监督的关系推理在定量和定性方面都是有效的,并且具有从浅到深结构不同大小的主干。 通过比较学习的表示形式可以轻松地从一个域转移到另一个域,它们的粒度细密,这可能是由于准确性和扩充数量之间的相关性所致。 在关系推理中,基于作者的实验[4],扩充的数量起着主要作用,影响对象簇的质量。 自我监督学习在许多方面都具有成为机器学习未来的强大潜力。
You can contact me if you want further discussion. Here is my Linkedin
如果您想进一步讨论,可以与我联系。 这是我的Linkedin
Enjoy!!! 👦🏻
请享用!!! 👦🏻
翻译自: https://towardsdatascience.com/train-without-labeling-data-using-self-supervised-learning-by-relational-reasoning-b0298ad818f9
深度学习 推理 训练
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/388940.shtml
如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!