1 Title
DiffusionDet: Diffusion Model for Object Detection(Shoufa Chen,Peize Sun,Yibing Song,Ping Luo)【ICCV 2023】
2 Conclusion
This study proposes DiffusionDet, a new framework that formulates object detection as a denoising diffusion process from noisy boxes to object boxes. During the training stage, object boxes diffuse from ground-truth boxes to random distribution, and the model learns to reverse this noising process. In inference, the model refines a set of randomly generated boxes to the output results in a progressive way.
3 Good Sentences
1、This noise-to-box approach requires neither heuristic object priors nor learnable queries, further simplifying the object candidates and pushing the development of the detection pipeline forward.(The advantage of diffusion model when used at object detection)
2、However, despite significant interest in this idea, there are no previous solutions that successfully adapt generative diffusion models for object detection, the progress of which remarkably lags behind that of segmentation. We argue that this may be because segmentation tasks are processed in an image-to-image style, which is more conceptually similar to the image generation tasks, while object detection is a set prediction problem which requires assigning object candidates to ground truth objects.(Why this study chose object detection to research)
3、As shown in Figure 3a, the performance of DiffusionDet increases steadily with the number of boxes used for evaluation. (The characteristics of DiffusionDet)
本文提出了 DiffusionDet,这是一个新的框架,它将对象检测表述为从噪声框到对象框的去噪扩散过程。在训练阶段,目标框从真实框扩散到随机分布,模型学会逆转这个噪声过程。在推理中,该模型以渐进的方式将一组随机生成的框细化为输出结果
a:标准DDPM扩散过程 b:去噪过程 c:目标检测去噪过程示意
DiffusionDet的框架如图所示,由于扩散模型迭代生成数据样本,因此需要在推理阶段多次运行模型。然而,在每个迭代步骤中直接将模型应用于原始图像在计算上是难以处理的。因此,本文将整个模型分成两部分,图像编码器和检测解码器,其中前者只运行一次以从原始输入图像 x 中提取深度特征表示,后者以此深度特征为条件,而不是原始图像,以逐步细化来自嘈杂框 zt 的框预测(这个想法跟latent diffusion 差不多,不过latent diffusion使用vae来提取特征)。
扩散法的主要性质在于对所有推理情况的一次训练。一旦模型被训练好了,它就可以用于在推理中改变方框的数量和样本步骤的数量,如图所示。扩散det可以通过使用更多的box或更多的细化步骤,以更高延迟为代价,从而实现更高的准确性。因此,可以在多个场景中部署一个扩散网络,并在不需要再训练网络的情况下获得一个期望的速度-精度的权衡。