A Cluster-based Undersampling Technique for Multiclass Skewed Datasets

Rose Mary Mathew; Ranganathan Gunasundari

Journal ArticleOPEN ACCESS

A Cluster-based Undersampling Technique for Multiclass Skewed Datasets

Engineering, Technology and Applied Science Research (2023) 13(3) 10785-10790

DOI: 10.48084/etasr.5844

4Citations

19Readers

Abstract

Imbalanced data classification is a demanding issue in data mining and machine learning. Models that learn with imbalanced input generate feeble performance in the minority class. Resampling methods can handle this issue and balance the skewed dataset. Cluster-based Under sampling (CUS) and Near-Miss (NM) techniques are widely used in imbalanced learning. However, these methods suffer from some serious flaws. CUS averts the impact of the distance factor on instances over the majority class. Near-miss method discards the inter-class data within the majority of class elements. To overcome these flaws, this study has come up with an undersampling technique called Adaptive K-means Clustering Undersampling (AKCUS). The proposed technique blends the distance factor and clustering over the majority class. The performance of the proposed method was analyzed with the aid of an experimental study. Three multiminority datasets with different imbalance ratios were selected and the models were created using KNearest Neighbor (kNN), Decision Tree (DT), and Random Forest (RF) classifiers. The experimental results show that AKCUS can attain better efficacy than the benchmark methods over multiminority datasets with high imbalance ratios.

Author supplied keywords

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Mathew, R. M., & Gunasundari, R. (2023). A Cluster-based Undersampling Technique for Multiclass Skewed Datasets. Engineering, Technology and Applied Science Research, 13(3), 10785–10790. https://doi.org/10.48084/etasr.5844

Readers' Seniority

PhD / Post grad / Masters / Doc 4

67%

Lecturer / Post doc 1

17%

Researcher 1

17%

Readers' Discipline

Computer Science 5

100%

Article Metrics

Mentions

News Mentions: 1

View details >

A Cluster-based Undersampling Technique for Multiclass Skewed Datasets

Abstract

Author supplied keywords

References Powered by Scopus

Exploratory undersampling for class-imbalance learning

Learning from class-imbalanced data: Review of methods and applications

The Condensed Nearest Neighbor Rule

Cited by Powered by Scopus

Multi-Class Imbalanced Data Classification: A Systematic Mapping Study

Advancing Preauthorization Task in Healthcare: An Application of Deep Active Incremental Learning for Medical Text Classification

Big Data in Education: Students at Risk as a Case Study

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline

Article Metrics