mc-BEiT: Multi-choice Discretization for Image BERT Pre-training

7Citations
Citations of this article
28Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Image BERT pre-training with masked image modeling (MIM) becomes a popular practice to cope with self-supervised representation learning. A seminal work, BEiT, casts MIM as a classification task with a visual vocabulary, tokenizing the continuous visual signals into discrete vision tokens using a pre-learned dVAE. Despite a feasible solution, the improper discretization hinders further improvements of image pre-training. Since image discretization has no ground-truth answers, we believe that the masked patch should not be assigned with a unique token id even if a better “tokenizer” can be obtained. In this work, we introduce an improved BERT-style image pre-training method, namely mc-BEiT, which performs MIM proxy tasks towards eased and refined multi-choice training objectives. Specifically, the multi-choice supervision for the masked image patches is formed by the soft probability vectors of the discrete token ids, which are predicted by the off-the-shelf image “tokenizer” and further refined by high-level inter-patch perceptions resorting to the observation that similar patches should share their choices. Extensive experiments on classification, segmentation, and detection tasks demonstrate the superiority of our method, e.g., the pre-trained ViT-B achieves 84.1% top-1 fine-tuning accuracy on ImageNet-1K classification, 49.2% AP b and 44.0% AP m of object detection and instance segmentation on COCO, 50.8% mIOU on ADE20K semantic segmentation, outperforming the competitive counterparts. The code is available at https://github.com/lixiaotong97/mc-BEiT.

Cite

CITATION STYLE

APA

Li, X., Ge, Y., Yi, K., Hu, Z., Shan, Y., & Duan, L. Y. (2022). mc-BEiT: Multi-choice Discretization for Image BERT Pre-training. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13690 LNCS, pp. 231–246). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-20056-4_14

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free