Unsupervised Arabic dialect segmentation for machine translation

1Citations
Citations of this article
15Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Resource-limited and morphologically rich languages pose many challenges to natural language processing tasks. Their highly inflected surface forms inflate the vocabulary size and increase sparsity in an already scarce data situation. In this article, we present an unsupervised learning approach to vocabulary reduction through morphological segmentation. We demonstrate its value in the context of machine translation for dialectal Arabic (DA), the primarily spoken, orthographically unstandardized, morphologically rich and yet resource poor variants of Standard Arabic. Our approach exploits the existence of monolingual and parallel data. We show comparable performance to state-of-the-art supervised methods for DA segmentation.

References Powered by Scopus

A systematic comparison of various statistical alignment models

2925Citations
N/AReaders
Get full text

Introduction to Arabic natural language processing

344Citations
N/AReaders
Get full text

Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop

313Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Arabic Text Formality Modification: A Review and Future Research Directions

0Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Salloum, W., & Habash, N. (2022). Unsupervised Arabic dialect segmentation for machine translation. Natural Language Engineering, 28(2), 223–248. https://doi.org/10.1017/S1351324920000455

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 3

60%

Lecturer / Post doc 2

40%

Readers' Discipline

Tooltip

Arts and Humanities 3

50%

Computer Science 2

33%

Linguistics 1

17%

Save time finding and organizing research with Mendeley

Sign up for free