动手学无人驾驶（3）：基于激光雷达3D多目标追踪

上一篇博客介绍了无人驾驶中的车辆检测算法（YOLO模型），该检测是基于图像进行的2D目标检测。在无人驾驶环境感知传感器中还有另一种重要的传感器：激光雷达。今天就介绍一篇无人驾驶中基于激光雷达目标检测的3D多目标追踪论文，使用的数据集也是KITTI数据集，使用的目标检测算法是PointRCNN。该3D多目标追踪系统不需要使用GPU且取得了最快的处理速度（214.7FPS）。

论文地址：A Baseline for 3D Multi-Object Tracking

Github地址：https://github.com/xinshuoweng/AB3DMOT

大家有时间可以看看这个视频，这是论文作者此前分享的一个3D多目标追踪报告，视频地址：https://www.bilibili.com/video/BV1DD4y1Q7JG/

3D Multi-Object Tracking: A Baseline and New Evaluation Metrics

文章目录

- - 1.概述
  - 2.方法
  - - A. 3D 目标检测
    - B. 3D 卡尔曼滤波---状态预测
    - C. 数据关联
    - D. 3D卡尔曼滤波-状态更新
    - E. Birth and Death内存
  - 3. 测试结果

1.概述

无论是在自动驾驶或辅助机器人等实时应用中，3D多目标追踪（MOT）都是非常重要的技术环节。然而，在近期的一些3D多目标追踪研究当中更多的都是关注于追踪系统的准确率而忽略了其计算成本和复杂性。

在本文中作者提出了一种简单而又准确的实时3D MOT基线系统。本文使用了现成的3D目标检测器从LiDAR点云数据中获取对应目标3D bounding box。然后，将3D卡尔曼滤波器和匈牙利算法结合起来进行目标状态估计和数据关联。为了评估提出的基线系统，论文在原有的KITTI官方2D MOT评价指标上进行了扩展，提出了三个新的3D MOT评价指标，本文主要贡献有：

设计了一种简单而又准确的实时3D MOT系统；
对KITTI官方2D MOT评价指标进行扩展，提出了新的3D MOT评价指标；
在KITTI评价工具上获得了最优的3D MOT性能，在2D MOT评价工具上获得了最快的处理速度（如下图所示）。

在这里插入图片描述

2.方法

3D MOT的目的是对一段连续数据中的3D 检测目标进行关联。在本系统中，需要使用当前帧中的检测结果以及上一帧中的关联轨迹。系统结构如下图所示，由5个部分组成：

（A）3D检测模块用于对LiDAR点云数据进行目标检测；

（B）3D卡尔曼滤波模块用于预测关联轨迹当前帧的状态；

（C）数据关联模块用于对检测模块输出结果和卡尔曼滤波模块输出的轨迹进行匹配关联；

（D）3D卡尔曼滤波根据检测结果更新匹配轨迹的状态；

（E）birth和death内存模块用于生成新的轨迹和删除未匹配的目标轨迹。

在这里插入图片描述
除了3D目标检测块以外，3D MOT系统不需要进行训练，可直接使用。

A. 3D 目标检测

在这里，直接使用在KITTI 3D数据及上训练好的3D 检测模型（即：PointRCNN）作为目标检测模块。

下面是一些术语规定，假设当前为 $t$ 时刻，3D检测模块输出是： $D_{t}=\{{D_{t}^{1}, D_{t}^{2}, ...,D_{t}^{n_{t}}}\}$ ，( $n_t$ 为 $t$ 时刻中检测目标数量)。每一个检测目标向量表示形式为： $z,\theta, l, w, h, s)$ ，即目标中心坐标 $(x, y, z)$ ，目标尺寸 $(l, w, h)$ ，偏航角 $θ\theta$ 以及置信度 $s$

B. 3D 卡尔曼滤波—状态预测

为了预测目标从上一帧到当前帧的轨迹状态，这里选择使用的运动模型为恒速运动模型，忽略了相邻帧内的目标位置移动。

具体地，我们使用11维向量来表示目标轨迹的状态 $\theta, l, w, h, s, v_{x}, v_{y,} v_{z})$ ，附加的3个变量表示目标在3个维度上的运动速度。

在每一帧，所有t-1帧的关联轨迹为 $Tt−1={Tt−11,Tt−12,⋯,Tt−1mt−1}T_{t-1}=\left\{T_{t-1}^{1}, T_{t-1}^{2}, \cdots, T_{t-1}^{m_{t-1}}\right\}$ （ $m_{t-1}$ 是t-1时刻的轨迹数量）将会被传播到下一帧 $t$ ，用 $T_{est}$ 表示。由恒速运动模型可以推出：
$xest=x+vxyest=y+vyzest=z+vzx_{est}=x+v_x \quad y_{est}=y+v_y \quad z_{est}=z+v_z$
因此， $T_{t-1}$ 中的每个轨迹 $Tt−1iT^i_{t-1}$ 传播到t帧中的预测状态将是 $Testi=(xest,yest,zest,θ,l,w,h,s,vx,vy,vz)T_{\mathrm{est}}^{i}=\left(x_{\mathrm{est}}, y_{\mathrm{est}}, z_{\mathrm{est}}, \theta, l, w, h, s, v_{x}, v_{y}, v_{z}\right)$ ，轨迹数据将会用于之后的关联模块。

3D 卡尔曼滤波的代码如下：

class KalmanBoxTracker(object):"""This class represents the internel state of individual tracked objects observed as bbox."""count = 0def __init__(self, bbox3D, info):"""Initialises a tracker using initial bounding box."""# define constant velocity modelself.kf = KalmanFilter(dim_x=10, dim_z=7)       self.kf.F = np.array([[1,0,0,0,0,0,0,1,0,0],      # state transition matrix[0,1,0,0,0,0,0,0,1,0],[0,0,1,0,0,0,0,0,0,1],[0,0,0,1,0,0,0,0,0,0],  [0,0,0,0,1,0,0,0,0,0],[0,0,0,0,0,1,0,0,0,0],[0,0,0,0,0,0,1,0,0,0],[0,0,0,0,0,0,0,1,0,0],[0,0,0,0,0,0,0,0,1,0],[0,0,0,0,0,0,0,0,0,1]])     self.kf.H = np.array([[1,0,0,0,0,0,0,0,0,0],      # measurement function,[0,1,0,0,0,0,0,0,0,0],[0,0,1,0,0,0,0,0,0,0],[0,0,0,1,0,0,0,0,0,0],[0,0,0,0,1,0,0,0,0,0],[0,0,0,0,0,1,0,0,0,0],[0,0,0,0,0,0,1,0,0,0]])# # with angular velocity# self.kf = KalmanFilter(dim_x=11, dim_z=7)       # self.kf.F = np.array([[1,0,0,0,0,0,0,1,0,0,0],      # state transition matrix#                       [0,1,0,0,0,0,0,0,1,0,0],#                       [0,0,1,0,0,0,0,0,0,1,0],#                       [0,0,0,1,0,0,0,0,0,0,1],  #                       [0,0,0,0,1,0,0,0,0,0,0],#                       [0,0,0,0,0,1,0,0,0,0,0],#                       [0,0,0,0,0,0,1,0,0,0,0],#                       [0,0,0,0,0,0,0,1,0,0,0],#                       [0,0,0,0,0,0,0,0,1,0,0],#                       [0,0,0,0,0,0,0,0,0,1,0],#                       [0,0,0,0,0,0,0,0,0,0,1]])     # self.kf.H = np.array([[1,0,0,0,0,0,0,0,0,0,0],      # measurement function,#                       [0,1,0,0,0,0,0,0,0,0,0],#                       [0,0,1,0,0,0,0,0,0,0,0],#                       [0,0,0,1,0,0,0,0,0,0,0],#                       [0,0,0,0,1,0,0,0,0,0,0],#                       [0,0,0,0,0,1,0,0,0,0,0],#                       [0,0,0,0,0,0,1,0,0,0,0]])# self.kf.R[0:,0:] *= 10.   # measurement uncertainty# state uncertainty, give high uncertainty to the unobservable initial velocities, covariance matrixself.kf.P[7:, 7:] *= 1000. 	                     self.kf.P *= 10.# self.kf.Q[-1,-1] *= 0.01    # process uncertaintyself.kf.Q[7:, 7:] *= 0.01self.kf.x[:7] = bbox3D.reshape((7, 1))self.time_since_update = 0self.id = KalmanBoxTracker.countKalmanBoxTracker.count += 1self.history = []self.hits = 1           # number of total hits including the first detectionself.hit_streak = 1     # number of continuing hit considering the first detectionself.first_continuing_hit = 1self.still_first = Trueself.age = 0self.info = info        # other info associateddef update(self, bbox3D, info): """ Updates the state vector with observed bbox."""self.time_since_update = 0self.history = []self.hits += 1self.hit_streak += 1          # number of continuing hitif self.still_first:self.first_continuing_hit += 1      # number of continuing hit in the fist time######################### orientation correctionif self.kf.x[3] >= np.pi: self.kf.x[3] -= np.pi * 2    # make the theta still in the rangeif self.kf.x[3] < -np.pi: self.kf.x[3] += np.pi * 2new_theta = bbox3D[3]if new_theta >= np.pi: new_theta -= np.pi * 2    # make the theta still in the rangeif new_theta < -np.pi: new_theta += np.pi * 2bbox3D[3] = new_thetapredicted_theta = self.kf.x[3]if abs(new_theta - predicted_theta) > np.pi / 2.0 and abs(new_theta - predicted_theta) < np.pi * 3 / 2.0:     			# if the angle of two theta is not acute angleself.kf.x[3] += np.pi       if self.kf.x[3] > np.pi: self.kf.x[3] -= np.pi * 2    # make the theta still in the rangeif self.kf.x[3] < -np.pi: self.kf.x[3] += np.pi * 2# now the angle is acute: < 90 or > 270, convert the case of > 270 to < 90if abs(new_theta - self.kf.x[3]) >= np.pi * 3 / 2.0:if new_theta > 0: self.kf.x[3] += np.pi * 2else: self.kf.x[3] -= np.pi * 2#########################     # flipself.kf.update(bbox3D)if self.kf.x[3] >= np.pi: self.kf.x[3] -= np.pi * 2    # make the theta still in the rageif self.kf.x[3] < -np.pi: self.kf.x[3] += np.pi * 2self.info = infodef predict(self):       """Advances the state vector and returns the predicted bounding box estimate."""self.kf.predict()      if self.kf.x[3] >= np.pi: self.kf.x[3] -= np.pi * 2if self.kf.x[3] < -np.pi: self.kf.x[3] += np.pi * 2self.age += 1if (self.time_since_update > 0):self.hit_streak = 0self.still_first = Falseself.time_since_update += 1self.history.append(self.kf.x)return self.history[-1]def get_state(self):"""Returns the current bounding box estimate."""return self.kf.x[:7].reshape((7, ))

C. 数据关联

为了将检测结果 $D_t$ 与预测轨迹 $T_{est}$ 进行匹配，我们将使用匈牙利算法。

维度为 $m_{t-1}, n_t)$ 的矩阵将用来计算每对 $D_t$ 和 $T_{est}$ 的 $3 D I o U$ 或目标中心距离。然后，使用匈牙利算法来处理二分图匹配问题。除此之外，当 $3 D I o U$ 小于 $IoU_{min}$ 时或目标中心距离大于 $dist_{max}$ 时，将拒绝进行匹配。数据关联模块的输出将分为4类：

$Tmatch={Tmatch1,Tmatch2,..,Tmatchwt}Dmatch={Dmatch1,Dmatch2,..,Dmatchwt}Tunmatch={Tunmatch1,Tunmatch2,..,Tunmatchmt−1−wt}Dunmatch={Dunmatch1,Dunmatch2,..,Dunmatchnt−wt}T_{match}=\{{T^1_{match},T^2_{match},..,T^{wt}_{match}}\}\\ D_{match}=\{{D^1_{match},D^2_{match},..,D^{wt}_{match}}\}\\ T_{unmatch}=\{{T^1_{unmatch},T^2_{unmatch},..,T^{m_{t-1}-wt}_{unmatch}}\}\\ D_{unmatch}=\{{D^1_{unmatch},D^2_{unmatch},..,D^{nt-wt}_{unmatch}}\}$
这里 $w_t$ 是匹配目标数量。

def associate_detections_to_trackers(detections, trackers, iou_threshold=0.01):   """Assigns detections to tracked object (both represented as bounding boxes)detections:  N x 8 x 3trackers:    M x 8 x 3Returns 3 lists of matches, unmatched_detections and unmatched_trackers"""if (len(trackers)==0): return np.empty((0, 2), dtype=int), np.arange(len(detections)), np.empty((0, 8, 3), dtype=int)    iou_matrix = np.zeros((len(detections), len(trackers)), dtype=np.float32)for d, det in enumerate(detections):for t, trk in enumerate(trackers):iou_matrix[d, t] = iou3d(det, trk)[0]             # det: 8 x 3, trk: 8 x 3# matched_indices = linear_assignment(-iou_matrix)      # hougarian algorithm, compatible to linear_assignment in sklearn.utilsrow_ind, col_ind = linear_sum_assignment(-iou_matrix)      # hougarian algorithmmatched_indices = np.stack((row_ind, col_ind), axis=1)unmatched_detections = []for d, det in enumerate(detections):if (d not in matched_indices[:, 0]): unmatched_detections.append(d)unmatched_trackers = []for t, trk in enumerate(trackers):if (t not in matched_indices[:, 1]): unmatched_trackers.append(t)#filter out matched with low IOUmatches = []for m in matched_indices:if (iou_matrix[m[0], m[1]] < iou_threshold):unmatched_detections.append(m[0])unmatched_trackers.append(m[1])else: matches.append(m.reshape(1, 2))if (len(matches) == 0): matches = np.empty((0, 2),dtype=int)else: matches = np.concatenate(matches, axis=0)return matches, np.array(unmatched_detections), np.array(unmatched_trackers)

D. 3D卡尔曼滤波-状态更新

由于 $T_{match}$ 的不确定性，我们根据检测结果 $D_{match}$ 对 $T_{match}$ 整个状态空间的每个轨迹进行更新，最后获得 $t$ 帧时刻的关联轨迹 $Tt={Tt1,Tt2,⋯,Ttwt}T_{t}=\left\{T_{t}^{1}, T_{t}^{2}, \cdots, T_{t}^{w_{t}}\right\}$ 。

根据贝叶斯规则，更新后的每一条轨迹 $Ttk=(x′,y′,z′,θ′,l′,w′,h′,s′,vx′,vy,′vz′)T_{t}^k=(x', y', z', \theta', l', w', h',s', v_{x}', v_{y,}' v_{z}')$ 是状态空间 $T_{match}^k$ 和 $D_{match}^k$ 的加权和，这的权重是由轨迹和检测的不确定性所决定的。

除此之外，我们发现原始的加权和对确定目标方向效果一般。对于匹配目标 $T_{t}^k$ ，其检测结果上的方向和运动轨迹方向可能完全相反，相差 $π\pi$ 个相位。但事实上，目标都是平缓移动，不可能在一帧中改变移动方向。从结果上来看， $T_t^k$ 目标的加权轨迹方向应该位于 $D_{match}^k$ 和 $T_{match}^k$ 之间。这往往将会导致比较低的 $3 D I o U$ 值。为此，我们提出一个方向修正技术，当两者相位相差超过 $π/2\pi/2$ 时，我们在 $T_{match}^k$ 上添加 $π\pi$ 个相位，以便 $T_{match}^k$ 和 $D_{match}^k$ 方向能大致一致。

E. Birth and Death内存

因为现存的目标可能会消失以及新的目标可能会进入画面，因此需要一个新的模块来管理出现和消失的轨迹。一方面，我们认为所有不匹配的检测可能会进入帧中，为了避免假阳性追踪，当在 $Bir_{min}$ 中持续检测到 $D_{unmatch}^p$ 时，将会生成一条新的轨迹 $T_{new}^p$ 。一旦新的轨迹产生，我们将初始化 $T_{new}^p$ 作为最新的测量 $D_{unmatch}^p$ ，三个方向的速度初始化为0。