Sentence Paraphrase Detection Using Classification Models

0Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper, we address on the task of sentence paraphrase detection which is focused on deciding whether the two sentences have the relationship of paraphrase. A supervised learning strategy for paraphrase detection is described whereby the two sentences are classified to decide the paraphrase relationship and using only the lexical features operated at n-gram as the classification features. Gradient Boosting, K-Nearest Neighbor, Decision Tree and Support vector machine are chosen as the classifiers. The performance of the classification method is compared and the features are analyzed to determine which of them are most important for paraphrase detection. Evaluation is performed on the corpus of 2016 Detecting Paraphrase in Indian Languages task proposed by Forum of Information Retrieval Evaluation (DPIL-FIRE2016). The experimental results show that the Gradient Boosting can achieve the highest Overall Score. By using the learned classifier, we got the highest F1 measure for both Task1 and Task2 on Malayalam and Tamil, and the highest F1 measure for Task2 on Punjabi in DPIL-FIRE2016.

Cite

CITATION STYLE

APA

Tian, L., Ning, H., Kong, L., Chen, K., Qi, H., & Han, Z. (2018). Sentence Paraphrase Detection Using Classification Models. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10478 LNCS, pp. 166–181). Springer Verlag. https://doi.org/10.1007/978-3-319-73606-8_13

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free