Data mining K-means document clustering using tfidf and word frequency count

5Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In the rapid development of www the amount of documents used increases in a rapid speed. This produces huge gigabyte of text document processing. For indexing as well as retrieving the required text document an efficient algorithms produce better performance by achieving good accuracy. The algorithms available in the field of data mining also provide a variety of new innovations regarding data mining. This increases the interest of the researchers to develop many essential models in the field of text data mining. In the proposed model is a two step text document clustering approach by K-Means algorithm. The first step includes Pre_Processing and second step includes clustering process. For Pre-Processing the method performs the tokenization approach. The distinct words are identified and the distinct words frequency of occurrence, TFIDF weights of the occurrences are calculated to form a document feature vector separately. In the clustering phase the feature vector is clustered by performing K-means algorithm by implementing various similarity measures.

Cite

CITATION STYLE

APA

Arivarasan, A., & Karthikeyan, M. (2019). Data mining K-means document clustering using tfidf and word frequency count. International Journal of Recent Technology and Engineering, 8(2), 2542–2549. https://doi.org/10.35940/ijrte.B1718.078219

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free