The Large Annotated Corpus for the Arabic Language (LACAL)

1Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Annotated corpora has an important role in the NLP field. They are used in almost all NLP applications: automatic dictionary construction, text analysis, information retrieval, machine translation, etc. Annotated corpora are the basis for training operation in NLP systems. Without these corpora, it is difficult to build an efficient system that takes into account all variations and linguistic phenomena. In this paper, we present the annotated corpus we developed. This corpus contains more than 12 million different words labeled by different types of labels: syntactic, morphological, and semantic. This large corpus adds value to the Arabic NLP field, and will certainly improve the quality of the training phase of Arabic NLP systems. Moreover it can be a suitable corpus to test and evaluate the quality of these systems.

Cite

CITATION STYLE

APA

Yousfi, A., Boumehdi, A., Laaroussi, S., Makoudi, R., Aouragh, S. L., Gueddah, H., … Said, I. (2022). The Large Annotated Corpus for the Arabic Language (LACAL). In Studies in Computational Intelligence (Vol. 1061, pp. 205–219). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-14748-7_12

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free