Human action recognition with 3D convolution skip-connections and RNNs

Jiarong Song; Zhong Yang; Qiuyan Zhang; Ting Fang; Guoxiong Hu; Jiaming Han; Cong Chen

Conference Proceedings

Human action recognition with 3D convolution skip-connections and RNNs

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 11301 LNCS 319-331

DOI: 10.1007/978-3-030-04167-0_29

3Citations

2Readers

Get full text

Abstract

This paper proposes a novel network architecture for human action recognition. First, we employ a pre-trained spatio-temporal feature extractor to perform spatio-temporal features extraction on videos. Then, several-level spatio-temporal features are concatenated by 3D convolution skip-connections. Moreover, a batch normalization layer is applied to normalize the concatenated features. Subsequently, we feed these normalized features into a RNN architecture to model temporal dependencies, which enables our network to deal with long-term information. In addition, we divide each video into three parts in which each part is split into non-overlapping 16-frame clips to achieve data augmentation. Finally, the proposed method is evaluated on UCF101 Dataset and is compared with existing excellent methods. Experimental results demonstrate that our method achieves the highest recognition accuracy.

Author supplied keywords

Cite

CITATION STYLE

APA

Song, J., Yang, Z., Zhang, Q., Fang, T., Hu, G., Han, J., & Chen, C. (2018). Human action recognition with 3D convolution skip-connections and RNNs. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11301 LNCS, pp. 319–331). Springer Verlag. https://doi.org/10.1007/978-3-030-04167-0_29

Human action recognition with 3D convolution skip-connections and RNNs

Abstract

Author supplied keywords

Cite

Register to see more suggestions