Improving Low-Resource Named Entity Recognition via Label-Aware Data Augmentation and Curriculum Denoising

Wenjing Zhu; Jian Liu; Jinan Xu; Yufeng Chen; Yujie Zhang

Conference Proceedings

Improving Low-Resource Named Entity Recognition via Label-Aware Data Augmentation and Curriculum Denoising

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2021) 12869 LNAI 355-370

DOI: 10.1007/978-3-030-84186-7_24

3Citations

1Readers

Get full text

Abstract

Deep neural networks have achieved state-of-the-art performances on named entity recognition (NER) with sufficient training data, while they perform poorly in low-resource scenarios due to data scarcity. To solve this problem, we propose a novel data augmentation method based on pre-trained language model (PLM) and curriculum learning strategy. Concretely, we use the PLM to generate diverse training instances through predicting different masked words and design a task-specific curriculum learning strategy to alleviate the influence of noises. We evaluate the effectiveness of our approach on three datasets: CoNLL-2003, OntoNotes5.0, and MaScip, of which the first two are simulated low-resource scenarios, and the last one is a real low-resource dataset in material science domain. Experimental results show that our method consistently outperform the baseline model. Specifically, our method achieves an absolute improvement of 3.46% F1 score on the 1% CoNLL-2003, 2.58% on the 1% OntoNotes5.0, and 0.99% on the full of MaScip.

Author supplied keywords

Cite

CITATION STYLE

APA

Zhu, W., Liu, J., Xu, J., Chen, Y., & Zhang, Y. (2021). Improving Low-Resource Named Entity Recognition via Label-Aware Data Augmentation and Curriculum Denoising. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12869 LNAI, pp. 355–370). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-84186-7_24

Improving Low-Resource Named Entity Recognition via Label-Aware Data Augmentation and Curriculum Denoising

Abstract

Author supplied keywords

Cite

Register to see more suggestions