Categorization of text documents plays a vital role in information retrieval systems. Clustering the text documents which supports for effective classification and extracting semantic knowledge is a tedious task. Most of the existing methods perform the clustering based on factors like term frequency, document frequency and feature selection methods. But still accuracy of clustering is not up to mark. In this paper we proposed an integrated approach with a metric named as Term Rank Identifier (TRI). TRI measures the frequent terms and indexes them based on their frequency. For those ranked terms TRI will finds the semantics and corresponding class labels. In this paper, we proposed a Semantically Enriched Terms Clustering (SETC) Algorithm, it is integrated with TRI improves the clustering accuracy which leads to incremental text categorization. Our experimental analysis on different data sets proved that the proposed SETC performing better.
CITATION STYLE
Purna Chand, K., & Narsimha, G. (2015). An integrated approach to improve the text categorization using semantic measures. In Smart Innovation, Systems and Technologies (Vol. 32, pp. 39–47). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-81-322-2208-8_5
Mendeley helps you to discover research relevant for your work.