Hierarchical RNN for few-shot information extraction learning

Shengpeng Liu; Ying Li; Binbin Fan

Conference Proceedings

Hierarchical RNN for few-shot information extraction learning

Communications in Computer and Information Science (2018) 902 227-239

DOI: 10.1007/978-981-13-2206-8_20

4Citations

1Readers

Get full text

Abstract

Web information extraction (IE) is the process of retrieving exact text fragments of record attributes from HTML web pages. Most of the existing approaches need to do a large amount of work on feature engineering, selecting or computing the underlying content, layout and contextual features from web pages. Another disadvantage is that a great number of human’s labor on annotating training example is required. Methods via solving wrapper adaption drastically reduce the annotating work but still need to label many pages on the seed website. In this work, we present a hierarchical attention recurrent neural network, which is an end-to-end model and do not require traditional, domain-specific feature engineering. The network can be also trained with only a few pages in a site, i.e. Few-Shot learning. As the model automatically and deeply learns the semantics of text fragments in pages, we adapt the network to extract records from the previously unseen websites. Experiments on a publicly available dataset demonstrate that our networks for both wrapper induction and adaption showed competitive results compared against state-of-the-art approaches.

Author supplied keywords

Cite

CITATION STYLE

APA

Liu, S., Li, Y., & Fan, B. (2018). Hierarchical RNN for few-shot information extraction learning. In Communications in Computer and Information Science (Vol. 902, pp. 227–239). Springer Verlag. https://doi.org/10.1007/978-981-13-2206-8_20

Hierarchical RNN for few-shot information extraction learning

Abstract

Author supplied keywords

Cite

Register to see more suggestions