Data mining K-means document clustering using tfidf and word frequency count

5Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In the rapid development of www the amount of documents used increases in a rapid speed. This produces huge gigabyte of text document processing. For indexing as well as retrieving the required text document an efficient algorithms produce better performance by achieving good accuracy. The algorithms available in the field of data mining also provide a variety of new innovations regarding data mining. This increases the interest of the researchers to develop many essential models in the field of text data mining. In the proposed model is a two step text document clustering approach by K-Means algorithm. The first step includes Pre_Processing and second step includes clustering process. For Pre-Processing the method performs the tokenization approach. The distinct words are identified and the distinct words frequency of occurrence, TFIDF weights of the occurrences are calculated to form a document feature vector separately. In the clustering phase the feature vector is clustered by performing K-means algorithm by implementing various similarity measures.

References Powered by Scopus

Arabic Web page clustering: A review

10Citations
N/AReaders
Get full text

Document Categorization Based on Usage of Features Reduction with Synonyms Clustering in Weak Semantic Map

6Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Boolean logic algebra driven similarity measure for text based applications

10Citations
N/AReaders
Get full text

An Approach for Documents Clustering Using K-Means Algorithm

3Citations
N/AReaders
Get full text

Recruitment Fraud Detection Method Based on Crowdsourcing and Multi-feature Fusion

2Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Arivarasan, A., & Karthikeyan, M. (2019). Data mining K-means document clustering using tfidf and word frequency count. International Journal of Recent Technology and Engineering, 8(2), 2542–2549. https://doi.org/10.35940/ijrte.B1718.078219

Readers' Seniority

Tooltip

Researcher 3

50%

Professor / Associate Prof. 2

33%

Lecturer / Post doc 1

17%

Readers' Discipline

Tooltip

Computer Science 5

71%

Social Sciences 1

14%

Engineering 1

14%

Save time finding and organizing research with Mendeley

Sign up for free