Collocation extraction in Turkish texts using statistical methods

Senem Kumova Metin; Bahar Karaoǧlan

Conference Proceedings

Collocation extraction in Turkish texts using statistical methods

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2010) 6233 LNAI 238-249

DOI: 10.1007/978-3-642-14770-8_27

14Citations

10Readers

Get full text

Abstract

Collocation is the combination of words in which words appear together more often than by chance. Since collocations are blocks of meaning, they play an important role in natural language processing applications (word sense disambiguation, part of speech tagging, machine translation, etc). In this study, a corpus of Turkish is subjected to the following statistical techniques: frequency of occurrence, mutual information and hypothesis tests. We have utilized both stemmed and surface form of corpus to explore the effect of stemming in collocation extraction. The techniques are evaluated by recall and precision measures. Chi-square hypothesis test and mutual information methods have produced better results compared to other methods on Turkish corpus. In addition, we have found that a stemmed corpus facilitates discrimination between successful and unsuccessful collocation extraction methods. © 2010 Springer-Verlag Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Kumova Metin, S., & Karaoǧlan, B. (2010). Collocation extraction in Turkish texts using statistical methods. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6233 LNAI, pp. 238–249). https://doi.org/10.1007/978-3-642-14770-8_27

Collocation extraction in Turkish texts using statistical methods

Abstract

Author supplied keywords

Cite

Register to see more suggestions