Recently, there has been growing interest in automating the extraction of key information from document images. Previous methods mainly focus on modelling the complex interactions between multimodal features(text, vision and layout) of documents to comprehend their content. However, only considering these interactions may not work well when dealing with unseen document templates. To address this issue, in this paper, we propose a novel approach that incorporates the concept of document inductive bias into the graph convolution framework. Our approach recognizes that the content of a text segment in a document is often determined by the context provided by its surrounding segments and utilizes an adjacency matrix hybrid strategy to integrate this bias into the model. As a result, the model is able to better understand the relationships between text segments even when faced with unseen templates. Besides, we employ an iterative method to perform graph convolution operation, making full use of the textual, visual, and spatial information contained within documents. Extensive experimental results on two publicly available datasets demonstrate the effectivness of our methods.
CITATION STYLE
Deng, J., Zhang, Y., Zhang, X., Tang, Z., & Gao, L. (2023). An Iterative Graph Learning Convolution Network for Key Information Extraction Based on the Document Inductive Bias. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 14189 LNCS, pp. 84–97). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-41682-8_6
Mendeley helps you to discover research relevant for your work.