ANTILLES: An Open French Linguistically Enriched Part-of-Speech Corpus

Yanis Labrak; Richard Dufour

Conference Proceedings

ANTILLES: An Open French Linguistically Enriched Part-of-Speech Corpus

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2022) 13502 LNAI 28-38

DOI: 10.1007/978-3-031-16270-1_3

0Citations

2Readers

Get full text

Abstract

Part-of-speech (POS) tagging is a classical natural language processing (NLP) task. Although many tools and corpora have been proposed, especially for the most widely spoken languages, these suffer from limitations concerning their user license, the size of their tagset, or even approaches no longer in the state-of-the-art. In this article, we propose ANTILLES, an extended version of an existing French corpus (UD French-GSD) comprising an original set of labels obtained with the aid of morphological characteristics (gender, number, tense, etc.). This extended version includes a set of 65 labels, against 16 in the initial version. We also implemented several POS tools for French from this corpus, incorporating the latest advances in the state-of-the-art in this area. The corpus as well as the POS labeling tools are fully open and freely available.

Author supplied keywords

Cite

CITATION STYLE

APA

Labrak, Y., & Dufour, R. (2022). ANTILLES: An Open French Linguistically Enriched Part-of-Speech Corpus. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13502 LNAI, pp. 28–38). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-16270-1_3

ANTILLES: An Open French Linguistically Enriched Part-of-Speech Corpus

Abstract

Author supplied keywords

Cite

Register to see more suggestions