Hierarchical RNN for few-shot information extraction learning

4Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Web information extraction (IE) is the process of retrieving exact text fragments of record attributes from HTML web pages. Most of the existing approaches need to do a large amount of work on feature engineering, selecting or computing the underlying content, layout and contextual features from web pages. Another disadvantage is that a great number of human’s labor on annotating training example is required. Methods via solving wrapper adaption drastically reduce the annotating work but still need to label many pages on the seed website. In this work, we present a hierarchical attention recurrent neural network, which is an end-to-end model and do not require traditional, domain-specific feature engineering. The network can be also trained with only a few pages in a site, i.e. Few-Shot learning. As the model automatically and deeply learns the semantics of text fragments in pages, we adapt the network to extract records from the previously unseen websites. Experiments on a publicly available dataset demonstrate that our networks for both wrapper induction and adaption showed competitive results compared against state-of-the-art approaches.

Cite

CITATION STYLE

APA

Liu, S., Li, Y., & Fan, B. (2018). Hierarchical RNN for few-shot information extraction learning. In Communications in Computer and Information Science (Vol. 902, pp. 227–239). Springer Verlag. https://doi.org/10.1007/978-981-13-2206-8_20

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free