From Within to Between: Knowledge Distillation for Cross Modality Retrieval

Vinh Tran; Niranjan Balasubramanian; Minh Hoai

Conference Proceedings

From Within to Between: Knowledge Distillation for Cross Modality Retrieval

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2023) 13844 LNCS 605-622

DOI: 10.1007/978-3-031-26316-3_36

0Citations

4Readers

Get full text

Abstract

We propose a novel loss function for training text-to-video and video-to-text retrieval networks based on knowledge distillation. This loss function addresses an important drawback of the max-margin loss function often used in existing cross-modality retrieval methods, in which a fixed margin is used in training to separate matching video-and-caption pairs from non-matching pairs, treating all non-matching pairs the same and failing to account for the different degrees of non-matching. We address this drawback by introducing a novel loss for the non-matching pairs; this loss leverages the knowledge within one domain to train a better network for matching between two domains. This proposed loss does not require extra annotation. It is complementary to the existing max-margin loss, and it can be integrated into the training pipeline of any cross-modality retrieval method. Experimental results on four cross-modal retrieval datasets namely MSRVTT, ActivityNet, DiDeMo, and MSVD show the effectiveness of the proposed method. Code is available at: https://github.com/tqvinhcs/CrossKD.

Author supplied keywords

Cite

CITATION STYLE

APA

Tran, V., Balasubramanian, N., & Hoai, M. (2023). From Within to Between: Knowledge Distillation for Cross Modality Retrieval. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13844 LNCS, pp. 605–622). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-26316-3_36

From Within to Between: Knowledge Distillation for Cross Modality Retrieval

Abstract

Author supplied keywords

Cite

Register to see more suggestions