Facial Micro-Expression Recognition Based on DeepLocal-Holistic Network 阅读笔记

中科院王老师团队的工作，用于做微表情识别。

摘要：

Toimprove the efficiency of micro-expression feature extraction,inspired by the psychological studyof attentional resource allocation for micro-expression cognition,we propose a deep local-holisticnetwork method for micro-expression recognition.

为了提高微表情特征提取的效率，我们提出了受到微表情认知注意资源分配心理学研究启发的深度本地-整体网络方法。

The first is a Hierarchical Convolutional Recurrent Neural Network(HCRNN),whichextracts the local and abundant spatio-temporal micro-expression features.

第一个是分层卷积循环神经网络（HCRNN），用于提取局部且丰富的时空微表情特征。

The second is a Robustprincipal-component-analysis-based recurrent neural network(RPRNN),which extracts global andsparse features with micro-expression-specific representations.

第二个是基于鲁棒主成分分析的循环神经网络（RPRNN），用于提取具有微表情特定表示的全局且稀疏特征。

The extracted effective features areemployed for micro-expression recognition through the fusion of sub-networks.

通过子网络的融合，利用提取的有效特征进行微表情识别。

1.简介

In order to help people recognize micro-expression,Ek-man et al.developed the Facial Action Coding System(FACS)[11]and defined the muscleactivity of facial expressions as action units(AU).

Ekman等人开发了面部动作编码系统(FACS)，将面部表情的肌肉活动定义为动作单元（AU），并开发了微表情训练工具（micro-expressionTT）

In addition,since the collection and labeling of micro-expressions aretime-consuming and laborious,the total number of published micro-expression samplesis about 1000.Therefore,micro-expression recognition is a typical small sample size(SSS)problem.

由于微表情的采集和标注耗时且繁琐，已发布的微表情样本总数约为1000，因此微表情识别是典型的小样本问题。

The architecture of the proposed method mainly includestwo sub-networks:(1)a hierarchical convolutional recurrent network(HCRNN),learninglocal and abundant features from original frames of micro-expression video clips,and(2)a robust principal component analysis recurrent network(RPRNN),extracting sparseinformation from original frames of micro-expression video clips by RPCA,and thenfeeding the sparse information to a deep learning model to extract holistic and sparsefeatures.

提出方法的架构主要包括两个子网络：（1）分层卷积循环网络（HCRNN），从微表情视频片段的原始帧中学习局部丰富的特征；（2）鲁棒主成分分析循环网络（RPRNN），通过RPCA从微表情视频片段的原始帧中提取稀疏信息，然后将稀疏信息输入到深度学习模型中提取整体和稀疏特征。

2.相关工作

2.1微表情识别

In the early stages of the study,most methods adopt handcrafted features to iden-tify micro-expressions.

这些方法包括将面部划分为特定区域，并利用3D梯度方向直方图描述符识别每个区域中的运动，使用LBP-TOP提取微表情的动态和外观特征，以及采用鲁棒主成分分析（RPCA）提取稀疏微表情信息和局部时空方向特征等。

However,the small sample size of micro-expression samplesand the subtle and brief nature of micro-expression limit the combination of deep learningwith micro-expression recognition methods.Thus,how to learn the micro-expressionfeatures effectively is necessary research for further performance improvement.

然而，微表情样本数量少且微表情的微妙短暂特性限制了深度学习与微表情识别方法的结合，因此，如何有效学习微表情特征对于进一步提高性能至关重要。

2.2深度卷积网络

It is a classic and widely usedstructure with three prominent characteristics:local receptive fields shared weights andspatial or temporal subsampling.

它是一种经典且广泛使用的结构，具有三个显著特点：局部感受野、共享权重和空间或时间下采样。

2.3循环神经网络

Recurrent neural network(RNN)can be used to process sequential data throughmapping an input sequence to a corresponding output sequence,using the hidden states.

循环神经网络（RNN）可以通过使用隐藏状态将输入序列映射到相应的输出序列，用于处理序列数据。

Since micro-expressions are very subtle,it is not easy to distinguish them from neutralfaces just by a single frame.The movement pattern in the temporal sequence is an essentialfeature for micro-expressions.Therefore,we extract the temporal features from micro-expression sequence based on BRNN and BLSTM to enhance the classification performance.

由于微表情非常微妙，单帧图像不易与中性表情区分开来。因此，基于BRNN和BLSTM，我们从微表情序列中提取时间特征，以增强分类性能。

2.4 RPCA

According to the characteristic of micro-expression with short duration and low inten-sity,micro-expression data are sparse in both the spatial and temporal domains.In 2014,Wang et.al.[24]proposed E as the deserved subtle motion information of micro-expressionand A as noise for micro-expression recognition.Inspired by this idea,we adopt RPCAto obtain sparse information from micro-expression frames,and then feed the extractedinformation to RPRNN,which learns sparse and holistic micro-expression features.

针对微表情短暂且强度低的特点，微表情数据在空间和时间域中都是稀疏的。因此，借鉴Wang等人的思想，将微表情中的细微动作信息视为所需的E，将噪声视为A，采用RPCA从微表情帧中提取稀疏信息，然后将提取的信息馈送到RPRNN，学习微表情的稀疏和整体特征。

3.提出的模型

模型的整体情况

3.1HCRNN用于提取局部特征

the HCRNN Model is constructed by theCNN Module and the BRNN Module.

HCRNN模型由CNN模块和BRNN模块构成。

3.1.1CNN模型

According to the facial physical structure,only four facial regions of interest(ROIs),i.e.,eyebrows,eyes,nose,and mouth,are used for the local micro-expression featureextraction(Figure 4a).

根据面部的物理结构，仅使用了四个面部感兴趣区域（ROI），即眉毛、眼睛、鼻子和嘴巴，用于提取局部微表情特征。

As shown in the HCRNN block of Figure 3,the structure of CNN module consists offour HCNNs.For each branch,the input is the ROI gray-scale images,and the networkcontains four convolutional layers.All four HCNNs have the same structure,as listedin Table 2.

在图3中的HCRNN模块中显示了CNN模块的结构，它由四个HCNNs组成。每个分支的输入是ROI灰度图像，网络包含四个卷积层。所有四个HCNN具有相同的结构。

3.1.2BRNN模型

In a micro-expression sequence,the past context and future context usually are usefulfor prediction.Thus,a BRNN module[46]is adopted to process temporal variation inmicro-expressions.

微表情序列中的过去和未来上下文通常对预测有用，因此采用了BRNN模块来处理微表情的时间变化。

We classify micro-expressions by an FC layer in L12 ofHCRNN and obtain probabilistic outputs by the softmax layer in L13 of HCRNN

在HCRNN的L12层通过FC层对微表情进行分类，并通过L13层的softmax层获得概率输出。

3.2RPRNN用于提取整体特征

3.2.1用RPCA提取稀疏微表情

Due to the short duration and low intensity of micro-expression movement,micro-expressions could be considered as sparse data.

由于微表情运动持续时间短，强度低，可被视为稀疏数据，因此采用RPCA来获取稀疏微表情信息。

Wright et al.adopted the 1-norm as a convex surrogate for thehighly nonconvex 0-norm and the nuclear norm(or sum of singular values)to replacenon-convex low-rank matrix,

为了解决非凸问题，采用了凸代替函数，其中1-范数代替了0-范数，核范数代替了非凸低秩矩阵。

3.2.2RPRNN的模型结构

The obtained sparse micro-expression images are fed into RPRNN to extract holisticfeatures.

稀疏的微表情图像被送入RPRNN以提取整体特征

in order to learn high-level micro-expression representations,a deep BLSTM network iscreated by multiple LSTM hidden layers.

为了学习高级微表情表示，通过多个LSTM隐藏层创建了一个深层BLSTM网络。

to avoid the overfitting problem,wecombine the cross-entropy loss function with L2 Regularization

为了避免过拟合问题，将交叉熵损失函数与L2正则化结合使用，其中θindex是权重值。

3.3模型混合

就是将两个子模型的结果融合到一起，方法如下

4.实验

做了对比实验和消融实验，没啥好说的，肯定是提出的方法最好。

5.结论与展望

DeepLocal-Holistic Network,which fused by HCRNN and RPRNN,captures the local-holistic,sparse-abundant micro-expression information,and boosts the performance of micro-expression recognition.

深度本地-整体网络通过HCRNN和RPRNN的融合，捕获了局部-整体、稀疏-丰富的微表情信息，并提高了微表情识别的性能。

In future work,wewill further investigate unsupervised learning as well as data augmentation methods toimprove the performance of micro-expression recognition.

在未来的工作中，我们将进一步研究无监督学习以及数据增强方法，以提高微表情识别的性能。