Annotated corpora has an important role in the NLP field. They are used in almost all NLP applications: automatic dictionary construction, text analysis, information retrieval, machine translation, etc. Annotated corpora are the basis for training operation in NLP systems. Without these corpora, it is difficult to build an efficient system that takes into account all variations and linguistic phenomena. In this paper, we present the annotated corpus we developed. This corpus contains more than 12 million different words labeled by different types of labels: syntactic, morphological, and semantic. This large corpus adds value to the Arabic NLP field, and will certainly improve the quality of the training phase of Arabic NLP systems. Moreover it can be a suitable corpus to test and evaluate the quality of these systems.
CITATION STYLE
Yousfi, A., Boumehdi, A., Laaroussi, S., Makoudi, R., Aouragh, S. L., Gueddah, H., … Said, I. (2022). The Large Annotated Corpus for the Arabic Language (LACAL). In Studies in Computational Intelligence (Vol. 1061, pp. 205–219). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-14748-7_12
Mendeley helps you to discover research relevant for your work.