Language Identification in code-mixed social media text contest aimed at Multilingual Meta Embeddings (MME), a productive method to learn multilingual representations for Language Identification. Language mixing occurs at a sentence boundary, within a sentence, or a word in code-mixing. This paper proffers an MME-driven language identification mechanism for code-mixed text. This study zeroed in on the comparison of different classifiers on Hindi-English code-mixed text data obtained from LinCE Benchmark corpus. LinCE is a centralized benchmark for linguistic code-switching evaluation that integrates ten corpora from four different code-switched language pairings with four tasks. Each instance in the dataset was a code-mixed sentence, and each token in the sentence was associated with a language label. Then we experimented with using different classifiers such as convolutional neural network, Gated Recurrent Unit, Long Short-Term Memory, Bidirectional Long Short-Term Memory, and Bidirectional Gated Recurrent Unit and we observed BiLstm outperformed well. A multilingual meta embedding technique was empirically evaluated for language identification.
CITATION STYLE
Teja, T. R., Shilpa, S., & Joseph, N. (2023). Meta Embeddings for LinCE Dataset. In Lecture Notes in Networks and Systems (Vol. 587, pp. 363–374). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-19-7874-6_26
Mendeley helps you to discover research relevant for your work.