Generation of a Large-Scale Line Image Dataset with Ground Truth Texts from Page-Level Autograph Documents

0Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Recently, Deep Learning techniques help to recognize Japanese historical cursive with high accuracy. However, most of the known cursive dataset have been gathered from printed documents which are written for the general public and easy to read. Our research aims to improve the recognition of autograph documents, which are more difficult to recognize than printed documents because they are often private and written in various writing styles. To create a useful autograph document dataset, this paper devises a technique to generate many line images accompanied by the corresponding ground truth (GT) texts, given an autograph document whose GT transcription is available only at the page-level. Our method utilizes HRNet for line detection and CRNN for line recognition. HRNet is used to decompose the page image into lines that is mapped to GT text, which is decomposed separately from GT transcription, by similarity-based alignment solved by beam search. We introduce two ideas to the alignment: to allow out-of-order mapping of the lines not adjacent to each other and to allow many-to-many mapping. With these orthogonal two ideas, we obtained a dataset consisting of 43,271 reliable autograph line images mapped to GT texts. By training CRNN from scratch on this dataset together with printed dataset, recognition accuracy for autograph documents is improved.

Cite

CITATION STYLE

APA

Nagai, A. (2021). Generation of a Large-Scale Line Image Dataset with Ground Truth Texts from Page-Level Autograph Documents. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13108 LNCS, pp. 354–366). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-92185-9_29

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free