Using comparable corpora to improve the effectiveness of cross-language information retrieval

2Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Large-scale comparable corpora became more abundant and accessible than parallel corpora, with the explosive growth of the World Wide Web. From the Cross-Language Information Retrieval point of view, limitation of translation resources as well as ambiguity arising due to failure to translate query terms is largely responsible for large drops in the effectiveness below monolingual performance. Therefore, strategies on bilingual terminology extraction from comparable texts must be given more attention in order to enrich existing bilingual lexicons and thesauri and to enhance Cross-Language Information Retrieval. In the present paper, we focus on the enhancement of Cross-Language Information Retrieval using a two-stage corpus-based translation model that includes bi-directional extraction of bilingual terminology from comparable corpora and selection of best translation alternatives on the basis of their morphological knowledge. The impact of comparable corpora on the performance of the Cross-Language Information Retrieval process is evaluated in this study and the results indicate that the effect is clearly positive, especially when using the linear combination with bilingual dictionaries and Japanese-English pair of languages. © 2010 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Sadat, F. (2010). Using comparable corpora to improve the effectiveness of cross-language information retrieval. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6233 LNAI, pp. 320–331). https://doi.org/10.1007/978-3-642-14770-8_36

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free