Data mining K-means document clustering using tfidf and word frequency count

Aranga Arivarasan; M. Karthikeyan

Journal ArticleOPEN ACCESS

Data mining K-means document clustering using tfidf and word frequency count

International Journal of Recent Technology and Engineering (2019) 8(2) 2542-2549

DOI: 10.35940/ijrte.B1718.078219

5Citations

13Readers

Get full text

Abstract

In the rapid development of www the amount of documents used increases in a rapid speed. This produces huge gigabyte of text document processing. For indexing as well as retrieving the required text document an efficient algorithms produce better performance by achieving good accuracy. The algorithms available in the field of data mining also provide a variety of new innovations regarding data mining. This increases the interest of the researchers to develop many essential models in the field of text data mining. In the proposed model is a two step text document clustering approach by K-Means algorithm. The first step includes Pre_Processing and second step includes clustering process. For Pre-Processing the method performs the tokenization approach. The distinct words are identified and the distinct words frequency of occurrence, TFIDF weights of the occurrences are calculated to form a document feature vector separately. In the clustering phase the feature vector is clustered by performing K-means algorithm by implementing various similarity measures.

Author supplied keywords

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Arivarasan, A., & Karthikeyan, M. (2019). Data mining K-means document clustering using tfidf and word frequency count. International Journal of Recent Technology and Engineering, 8(2), 2542–2549. https://doi.org/10.35940/ijrte.B1718.078219

Readers' Seniority

Researcher 3

50%

Professor / Associate Prof. 2

33%

Lecturer / Post doc 1

17%

Readers' Discipline

Computer Science 5

71%

Social Sciences 1

14%

Engineering 1

14%

Data mining K-means document clustering using tfidf and word frequency count

Abstract

Author supplied keywords

References Powered by Scopus

Arabic Web page clustering: A review

Document Categorization Based on Usage of Features Reduction with Synonyms Clustering in Weak Semantic Map

Cited by Powered by Scopus

Boolean logic algebra driven similarity measure for text based applications

An Approach for Documents Clustering Using K-Means Algorithm

Recruitment Fraud Detection Method Based on Crowdsourcing and Multi-feature Fusion

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline