Learning Affective Video Features for Facial Expression Recognition via Hybrid Deep Learning

125Citations
Citations of this article
153Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

One key challenging issues of facial expression recognition (FER) in video sequences is to extract discriminative spatiotemporal video features from facial expression images in video sequences. In this paper, we propose a new method of FER in video sequences via a hybrid deep learning model. The proposed method first employs two individual deep convolutional neural networks (CNNs), including a spatial CNN processing static facial images and a temporal CN network processing optical flow images, to separately learn high-level spatial and temporal features on the divided video segments. These two CNNs are fine-tuned on target video facial expression datasets from a pre-trained CNN model. Then, the obtained segment-level spatial and temporal features are integrated into a deep fusion network built with a deep belief network (DBN) model. This deep fusion network is used to jointly learn discriminative spatiotemporal features. Finally, an average pooling is performed on the learned DBN segment-level features in a video sequence, to produce a fixed-length global video feature representation. Based on the global video feature representations, a linear support vector machine (SVM) is employed for facial expression classification tasks. The extensive experiments on three public video-based facial expression datasets, i.e., BAUM-1s, RML, and MMI, show the effectiveness of our proposed method, outperforming the state-of-the-arts.

References Powered by Scopus

Deep residual learning for image recognition

174312Citations
N/AReaders
Get full text

Gradient-based learning applied to document recognition

44099Citations
N/AReaders
Get full text

Going deeper with convolutions

39601Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Image recognition of wind turbine blade damage based on a deep learning model with transfer learning and an ensemble learning classifier

167Citations
N/AReaders
Get full text

Rolling Bearing Fault Diagnosis Based on Convolutional Neural Network and Support Vector Machine

133Citations
N/AReaders
Get full text

Artificial intelligence-based hybrid deep learning models for image classification: The first narrative review

131Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Zhang, S., Pan, X., Cui, Y., Zhao, X., & Liu, L. (2019). Learning Affective Video Features for Facial Expression Recognition via Hybrid Deep Learning. IEEE Access, 7, 32297–32304. https://doi.org/10.1109/ACCESS.2019.2901521

Readers over time

‘19‘20‘21‘22‘23‘24‘2509182736

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 40

63%

Lecturer / Post doc 13

20%

Researcher 9

14%

Professor / Associate Prof. 2

3%

Readers' Discipline

Tooltip

Computer Science 43

65%

Engineering 19

29%

Chemistry 2

3%

Medicine and Dentistry 2

3%

Save time finding and organizing research with Mendeley

Sign up for free
0