From Within to Between: Knowledge Distillation for Cross Modality Retrieval

0Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We propose a novel loss function for training text-to-video and video-to-text retrieval networks based on knowledge distillation. This loss function addresses an important drawback of the max-margin loss function often used in existing cross-modality retrieval methods, in which a fixed margin is used in training to separate matching video-and-caption pairs from non-matching pairs, treating all non-matching pairs the same and failing to account for the different degrees of non-matching. We address this drawback by introducing a novel loss for the non-matching pairs; this loss leverages the knowledge within one domain to train a better network for matching between two domains. This proposed loss does not require extra annotation. It is complementary to the existing max-margin loss, and it can be integrated into the training pipeline of any cross-modality retrieval method. Experimental results on four cross-modal retrieval datasets namely MSRVTT, ActivityNet, DiDeMo, and MSVD show the effectiveness of the proposed method. Code is available at: https://github.com/tqvinhcs/CrossKD.

Cite

CITATION STYLE

APA

Tran, V., Balasubramanian, N., & Hoai, M. (2023). From Within to Between: Knowledge Distillation for Cross Modality Retrieval. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13844 LNCS, pp. 605–622). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-26316-3_36

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free