介绍
PASSL 是一个基于 PaddlePaddle 的视觉库,用于使用 PaddlePaddle 进行最先进的视觉自监督学习研究。PASSL旨在加速自监督学习的研究周期:从设计一个新的自监督任务到评估所学的表征。
PASSL 主要特性:
-
自监督前沿算法实现
PASSL 实现了多种前沿自监督学习算法,包括不限于 SimCLR、MoCo(v1)、MoCo(v2)、MoCo-BYOL、CLIP、BYOL、BEiT。同时支持有监督分类训练。
-
模块化设计
易于建立新的任务和重用其他任务的现有组件 (Trainer, models and heads, data transforms, etc.)
🛠️ PASSL 的最终目标是利用自监督学习为下游任务提供更合适的预训练权重,同时大幅度降低数据标注成本。
模型库
- Self-Supervised Learning Models
PASSL 实现了一系列自监督学习算法,更具体的使用文档请参阅 Document
Epochs | Official results | PASSL results | Backbone | Model | Document | |
---|---|---|---|---|---|---|
MoCo | 200 | 60.6 | 60.64 | ResNet-50 | download | Train MoCo |
SimCLR | 100 | 64.5 | 65.3 | ResNet-50 | download | Train SimCLR |
MoCo v2 | 200 | 67.7 | 67.72 | ResNet-50 | download | Train MoCo |
MoCo-BYOL | 300 | 71.56 | 72.10 | ResNet-50 | download | Train MoCo-BYOL |
BYOL | 300 | 72.50 | 71.62 | ResNet-50 | download | Train BYOL |
PixPro | 100 | 55.1(fp16) | 57.2(fp32) | ResNet-50 | download | Train PixPro |
SimSiam | 100 | 68.3 | 68.4 | ResNet-50 | download | Train SimSiam |
DenseCL | 200 | 63.62 | 63.37 | ResNet-50 | download | Train PixPro |
SwAV | 100 | 72.1 | 72.4 | ResNet-50 | download | Train SwAV |
Benchmark Linear Image Classification on ImageNet-1K.
- Classification Models
PASSL 实现了视觉 Transformer 等具有影响力的图像分类算法,并提供了相应的预训练权重。旨在支持自监督、多模态、大模型算法的建设和研究。更多使用细节请参阅 Classification_Models_Guide.md
Detail | Tutorial | |
---|---|---|
ViT | / | PaddleEdu |
Swin Transformer | / | PaddleEdu |
CaiT | config | PaddleFleet |
T2T-ViT | config | PaddleFleet |
CvT | config | PaddleFleet |
BEiT | config | unofficial |
MLP-Mixer | config | PaddleFleet |
ConvNeXt | config | PaddleFleet |
🔥 PASSL 提供了详细的算法剖析,具体请参阅 Tutorial。
安装
请参阅 INSTALL.md 进行安装
快速开始
请参阅 GETTING_STARTED.md 了解 PASSL 的基本用法
Awesome SSL
自监督学习 (Self-Supervised Learning, SSL) 是一个发展十分迅速的领域,这里列出一些具有影响力的 Paper 供研究使用。PASSL 会争取实现具有应用潜力的自监督算法
- Masked Feature Prediction for Self-Supervised Visual Pre-Training by Chen Wei, Haoqi Fan, Saining Xie, Chao-Yuan Wu, Alan Yuille, Christoph Feichtenhofer.
- Masked Autoencoders Are Scalable Vision Learners by Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick.
- Corrupted Image Modeling for Self-Supervised Visual Pre-Training by Yuxin Fang, Li Dong, Hangbo Bao, Xinggang Wang, Furu Wei.
- Are Large-scale Datasets Necessary for Self-Supervised Pre-training? by Alaaeldin El-Nouby, Gautier Izacard, Hugo Touvron, Ivan Laptev, Hervé Jegou, Edouard Grave.
- PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers by Xiaoyi Dong, Jianmin Bao, Ting Zhang, Dongdong Chen, Weiming Zhang, Lu Yuan, Dong Chen, Fang Wen, Nenghai Yu.
- SimMIM: A Simple Framework for Masked Image Modeling by Zhenda Xie, Zheng Zhang, Yue Cao, Yutong Lin, Jianmin Bao, Zhuliang Yao, Qi Dai, Han Hu.