Stronger, Faster and More Explainable: A Graph Convolutional Baseline for Skeleton-based Action Recognition

296Citations
Citations of this article
131Readers
Mendeley users who have this article in their library.
Get full text

Abstract

One essential problem in skeleton-based action recognition is how to extract discriminative features over all skeleton joints. However, the complexity of the State-Of-The-Art (SOTA) models of this task tends to be exceedingly sophisticated and over-parameterized, where the low efficiency in model training and inference has obstructed the development in the field, especially for large-scale action datasets. In this work, we propose an efficient but strong baseline based on Graph Convolutional Network (GCN), where three main improvements are aggregated, i.e., early fused Multiple Input Branches (MIB), Residual GCN (ResGCN) with bottleneck structure and Part-wise Attention (PartAtt) block. Firstly, an MIB is designed to enrich informative skeleton features and remain compact representations at an early fusion stage. Then, inspired by the success of the ResNet architecture in Convolutional Neural Network (CNN), a ResGCN module is introduced in GCN to alleviate computational costs and reduce learning difficulties in model training while maintain the model accuracy. Finally, a PartAtt block is proposed to discover the most essential body parts over a whole action sequence and obtain more explainable representations for different skeleton action sequences. Extensive experiments on two large-scale datasets, i.e., NTU RGB+D 60 and 120, validate that the proposed baseline slightly outperforms other SOTA models and meanwhile requires much fewer parameters during training and inference procedures, e.g., at most 34 times less than DGNN, which is one of the best SOTA methods.

References Powered by Scopus

Deep residual learning for image recognition

174329Citations
N/AReaders
Get full text

CBAM: Convolutional block attention module

18434Citations
N/AReaders
Get full text

Learning Deep Features for Discriminative Localization

8571Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition

569Citations
N/AReaders
Get full text

Revisiting Skeleton-based Action Recognition

461Citations
N/AReaders
Get full text

Constructing Stronger and Faster Baselines for Skeleton-Based Action Recognition

254Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Song, Y. F., Zhang, Z., Shan, C., & Wang, L. (2020). Stronger, Faster and More Explainable: A Graph Convolutional Baseline for Skeleton-based Action Recognition. In MM 2020 - Proceedings of the 28th ACM International Conference on Multimedia (pp. 1625–1633). Association for Computing Machinery, Inc. https://doi.org/10.1145/3394171.3413802

Readers over time

‘20‘21‘22‘23‘24‘25020406080

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 40

78%

Researcher 6

12%

Lecturer / Post doc 4

8%

Professor / Associate Prof. 1

2%

Readers' Discipline

Tooltip

Computer Science 40

77%

Engineering 8

15%

Physics and Astronomy 2

4%

Biochemistry, Genetics and Molecular Bi... 2

4%

Save time finding and organizing research with Mendeley

Sign up for free
0