Improving Low-Resource Named Entity Recognition via Label-Aware Data Augmentation and Curriculum Denoising

3Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Deep neural networks have achieved state-of-the-art performances on named entity recognition (NER) with sufficient training data, while they perform poorly in low-resource scenarios due to data scarcity. To solve this problem, we propose a novel data augmentation method based on pre-trained language model (PLM) and curriculum learning strategy. Concretely, we use the PLM to generate diverse training instances through predicting different masked words and design a task-specific curriculum learning strategy to alleviate the influence of noises. We evaluate the effectiveness of our approach on three datasets: CoNLL-2003, OntoNotes5.0, and MaScip, of which the first two are simulated low-resource scenarios, and the last one is a real low-resource dataset in material science domain. Experimental results show that our method consistently outperform the baseline model. Specifically, our method achieves an absolute improvement of 3.46% F1 score on the 1% CoNLL-2003, 2.58% on the 1% OntoNotes5.0, and 0.99% on the full of MaScip.

Cite

CITATION STYLE

APA

Zhu, W., Liu, J., Xu, J., Chen, Y., & Zhang, Y. (2021). Improving Low-Resource Named Entity Recognition via Label-Aware Data Augmentation and Curriculum Denoising. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12869 LNAI, pp. 355–370). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-84186-7_24

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free