Deep neural networks have achieved state-of-the-art performances on named entity recognition (NER) with sufficient training data, while they perform poorly in low-resource scenarios due to data scarcity. To solve this problem, we propose a novel data augmentation method based on pre-trained language model (PLM) and curriculum learning strategy. Concretely, we use the PLM to generate diverse training instances through predicting different masked words and design a task-specific curriculum learning strategy to alleviate the influence of noises. We evaluate the effectiveness of our approach on three datasets: CoNLL-2003, OntoNotes5.0, and MaScip, of which the first two are simulated low-resource scenarios, and the last one is a real low-resource dataset in material science domain. Experimental results show that our method consistently outperform the baseline model. Specifically, our method achieves an absolute improvement of 3.46% F1 score on the 1% CoNLL-2003, 2.58% on the 1% OntoNotes5.0, and 0.99% on the full of MaScip.
CITATION STYLE
Zhu, W., Liu, J., Xu, J., Chen, Y., & Zhang, Y. (2021). Improving Low-Resource Named Entity Recognition via Label-Aware Data Augmentation and Curriculum Denoising. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12869 LNAI, pp. 355–370). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-84186-7_24
Mendeley helps you to discover research relevant for your work.