A proposal-based approach for activity image-to-video retrieval

16Citations
Citations of this article
28Readers
Mendeley users who have this article in their library.

Abstract

Activity image-to-video retrieval task aims to retrieve videos containing the similar activity as the query image, which is a challenging task because videos generally have many background segments irrelevant to the activity. In this paper, we utilize R-C3D model to represent a video by a bag of activity proposals, which can filter out background segments to some extent. However, there are still noisy proposals in each bag. Thus, we propose an Activity Proposal-based Image-to-Video Retrieval (APIVR) approach, which incorporates multi-instance learning into cross-modal retrieval framework to address the proposal noise issue. Specifically, we propose a Graph Multi-Instance Learning (GMIL) module with graph convolutional layer, and integrate this module with classification loss, adversarial loss, and triplet loss in our cross-modal retrieval framework. Moreover, we propose geometry-aware triplet loss based on point-to-subspace distance to preserve the structural information of activity proposals. Extensive experiments on three widely-used datasets verify the effectiveness of our approach.

References Powered by Scopus

Learning spatiotemporal features with 3D convolutional networks

7987Citations
N/AReaders
Get full text

3D Convolutional neural networks for human action recognition

5286Citations
N/AReaders
Get full text

Canonical correlation analysis: An overview with application to learning methods

2646Citations
N/AReaders
Get full text

Cited by Powered by Scopus

HANet: Hierarchical Alignment Networks for Video-Text Retrieval

49Citations
N/AReaders
Get full text

Temporal Sentence Grounding in Videos: A Survey and Future Directions

31Citations
N/AReaders
Get full text

Keyword-Based Diverse Image Retrieval With Variational Multiple Instance Graph

15Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Xu, R., Niu, L., Zhang, J., & Zhang, L. (2020). A proposal-based approach for activity image-to-video retrieval. In AAAI 2020 - 34th AAAI Conference on Artificial Intelligence (pp. 12524–12531). AAAI press. https://doi.org/10.1609/aaai.v34i07.6941

Readers over time

‘19‘20‘21‘22‘23‘250481216

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 11

100%

Readers' Discipline

Tooltip

Computer Science 10

91%

Engineering 1

9%

Save time finding and organizing research with Mendeley

Sign up for free
0