[论文精读]Dynamic Coarse-to-Fine Learning for Oriented Tiny Object Detection

论文网址：[2304.08876] 用于定向微小目标检测的动态粗到细学习 (arxiv.org)

论文代码：https://github.com/ChaselTsui/mmrotate-dcfl

英文是纯手打的！论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误，若有发现欢迎评论指正！文章偏向于笔记，谨慎食用

1. 省流版

1.1. 心得

（1）为什么学脑科学的我要看这个啊？愿世界上没有黑工

（2）最开始写小标题的时候就发现了，分得好细啊，好感度++

1.2. 论文总结图

2. 论文逐段精读

2.1. Abstract

①Extreme geometric shapes (tiny) and finite features (few pixels) of tiny rotating objects will cause serious mismatch (inaccurate positional prior?) and imbalance (inaccurate positive sample features?) issues

②They proposed dynamic prior and coarse-to-fine assigner, called DCFL

posterior adj.在后部的;在后面的 n.臀部;屁股

2.2. Introduction

①Oriented bounding box greatly eliminates redundant background area, especially in aerial images

②Comparison figure:

where M* denotes matching function;

green, blue and red boxes are true positive, false positive, and false negative predictions respectively,

the left figure set is static and the right is dynamic

③Figure of mismatch and imbalance issues:

each point in the left figure denotes a prior location（先验打那么多个点啊...而且为啥打得那么整齐，这是什么one-shot吗）

饼状图是说当每个框都是某个角度的时候吗？当每个框都不旋转的时候阳性样本平均数量是5.2？还是说饼状图的意思是自由旋转，某个特定角度的框的阳性样本是多少多少？这个饼状图并没有横向比较诶，只有这张图自己内部比较。

柱状图是锚框大小不同下平均阳性

④They introduce dynamic Prior Capturing Block (PCB) as their prior method. Based on this, they further utilize Cross-FPN-layer Coarse Positive Sample (CPS) to assign labels. After that, they reorder these candidates by prediction (posterior), and present gt by finer Dynamic Gaussian Mixture Model (DGMM)

eradicate vt.根除;消灭;杜绝 n.根除者;褪色灵

2.3. Related Work

2.3.1. Oriented Object Detection

（1）Prior for Oriented Objects

（2）Label Assignment

2.3.2. Tiny Object Detection

（1）Multi-scale Learning

（2）Label Assignment

（3）Context Information

（4）Feature Enhancement

2.4. Method

（1）Overview

①For a set of dense prior $P\in\mathbb{R}^{W\times H\times C}$ , where $W$ denotes width, $H$ denotes height and $C$ denotes the number of shape information（什么东西啊，是那些点吗）, mapping it to $D$ by Deep Neural Network (DNN):

$D=\mathrm{DNN}_{h}(P)$

where $\mathrm{DNN}_{h}$ represents the detection head（探测头...外行不太懂，感觉也就是一个函数嘛？）;

one part $D_{cls}\in\mathbb{R}^{W\times H\times A}$ in $D$ denotes the classification scores, where $A$ means the class number（更被认为是阳性的样本那层的 $W\times H$ 里的数据会更大吗）;

one part $D_{reg}\in\mathbb{R}^{W\times H\times B}$ in $D$ denotes the classification scores, where $B$ means the box parameter number（什么东西？box parameter？什么是箱参数？）

②In static methods, the pos labels assigned for $P$ is $G=\mathcal{M}_{s}(P,GT)$

③In dynamic methods, the pos labels set $G$ integrate posterior information: $G={\mathcal M}_{d}(P,D,GT)$

④The loss function:

$\mathcal{L}=\sum_{i=1}^{N_{pos}}\mathcal{L}_{pos}(D_{i},G_{i})+\sum_{j=1}^{N_{neg}}\mathcal{L}_{neg}(D_{j},y_{j})$

where $N_{pos}$ and $N_{neg}$ represent the number of positive and negative samples, $y_i$ is the neg labels set

⑤Modelling $D$ , ${\mathcal M}_{d}$ and $GT$ :

$\tilde{D}=\mathrm{DNN}_{h}(\underbrace{\mathrm{DNN}_{p}(P)}_{\text{Dynamic Prior}\hat{P}})$

$\tilde{G}=\mathcal{M}_{d}(\mathcal{M}_{s}(\tilde{P},GT),\tilde{GT})$

$\mathcal{L}=\sum_{i=1}^{\hat{N}_{pos}}\mathcal{L}_{pos}(\tilde{D}_{i},\tilde{G}_{i})+\sum_{j=1}^{\tilde{N}_{neg}}\mathcal{L}_{neg}(\tilde{D}_{j},y_{j})$