Unsupervised Arabic dialect segmentation for machine translation

Wael Salloum; Nizar Habash

Journal Article

Unsupervised Arabic dialect segmentation for machine translation

Natural Language Engineering (2022) 28(2) 223-248

DOI: 10.1017/S1351324920000455

1Citations

15Readers

Get full text

Abstract

Resource-limited and morphologically rich languages pose many challenges to natural language processing tasks. Their highly inflected surface forms inflate the vocabulary size and increase sparsity in an already scarce data situation. In this article, we present an unsupervised learning approach to vocabulary reduction through morphological segmentation. We demonstrate its value in the context of machine translation for dialectal Arabic (DA), the primarily spoken, orthographically unstandardized, morphologically rich and yet resource poor variants of Standard Arabic. Our approach exploits the existence of monolingual and parallel data. We show comparable performance to state-of-the-art supervised methods for DA segmentation.

Author supplied keywords

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Salloum, W., & Habash, N. (2022). Unsupervised Arabic dialect segmentation for machine translation. Natural Language Engineering, 28(2), 223–248. https://doi.org/10.1017/S1351324920000455

Readers' Seniority

PhD / Post grad / Masters / Doc 3

60%

Lecturer / Post doc 2

40%

Readers' Discipline

Arts and Humanities 3

50%

Computer Science 2

33%

Linguistics 1

17%

Unsupervised Arabic dialect segmentation for machine translation

Abstract

Author supplied keywords

References Powered by Scopus

A systematic comparison of various statistical alignment models

Introduction to Arabic natural language processing

Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop

Cited by Powered by Scopus

Arabic Text Formality Modification: A Review and Future Research Directions

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline