The Large Annotated Corpus for the Arabic Language (LACAL)

Abdellah Yousfi; Ahmed Boumehdi; Saida Laaroussi; Rania Makoudi; Si Lhoussain Aouragh; Hicham Gueddah; Brahim Habibi; Mohamed Nejja; Iazi Said

Book Chapter

The Large Annotated Corpus for the Arabic Language (LACAL)

Springer Science and Business Media Deutschland GmbH, (2022), 205-219

DOI: 10.1007/978-3-031-14748-7_12

1Citations

1Readers

Get full text

Abstract

Annotated corpora has an important role in the NLP field. They are used in almost all NLP applications: automatic dictionary construction, text analysis, information retrieval, machine translation, etc. Annotated corpora are the basis for training operation in NLP systems. Without these corpora, it is difficult to build an efficient system that takes into account all variations and linguistic phenomena. In this paper, we present the annotated corpus we developed. This corpus contains more than 12 million different words labeled by different types of labels: syntactic, morphological, and semantic. This large corpus adds value to the Arabic NLP field, and will certainly improve the quality of the training phase of Arabic NLP systems. Moreover it can be a suitable corpus to test and evaluate the quality of these systems.

Author supplied keywords

Cite

CITATION STYLE

APA

Yousfi, A., Boumehdi, A., Laaroussi, S., Makoudi, R., Aouragh, S. L., Gueddah, H., … Said, I. (2022). The Large Annotated Corpus for the Arabic Language (LACAL). In Studies in Computational Intelligence (Vol. 1061, pp. 205–219). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-14748-7_12

The Large Annotated Corpus for the Arabic Language (LACAL)

Abstract

Author supplied keywords

Cite

Register to see more suggestions