The impact of data re-sampling on learning performance of class imbalanced bankruptcy prediction models

Dilip Singh Sisodia; Upasana Verma

Journal ArticleOPEN ACCESS

The impact of data re-sampling on learning performance of class imbalanced bankruptcy prediction models

International Journal on Electrical Engineering and Informatics (2018) 10(3) 433-446

DOI: 10.15676/ijeei.2018.10.3.2

9Citations

26Readers

Get full text

Abstract

The aim of this paper is to evaluate the effect of data sampling techniques on the performance of learners using real highly imbalanced Spanish bankruptcy dataset. The class imbalance problem refers to the highly uneven distribution of class instances where one class is having most of the instances than others. In the presence of highly skewed data distribution, the performance of classical learners is heavily biased in recognizing the majority class and consequently leads to the performance degradation of quantitative classifier or predictors models. In this paper, six sampling methods such as synthetic minority oversampling technique (SMOTE), Borderline-SMOTE, Safe-level-SMOTE, Random under sampling, random oversampling and condensed nearest neighbor are used with a different individual(SVM, C4.5, and Logistic regression) and ensemble learners(AdaBoostM1, DTBagging, and Random Forests). The different quantitative prediction models are designed by combination data sampling techniques and classical learners. The performance of quantitative prediction models are evaluated using G-Mean and area under the curve (AUC) measures on the real highly imbalanced data set. The result suggest that the performance of oversampling (with LR and DTBagging) and undersampling (with C4.5 and RF) methods are superior as compare to others on this data set.

Author supplied keywords

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Sisodia, D. S., & Verma, U. (2018). The impact of data re-sampling on learning performance of class imbalanced bankruptcy prediction models. International Journal on Electrical Engineering and Informatics, 10(3), 433–446. https://doi.org/10.15676/ijeei.2018.10.3.2

Readers' Seniority

PhD / Post grad / Masters / Doc 12

75%

Lecturer / Post doc 3

19%

Professor / Associate Prof. 1

Readers' Discipline

Computer Science 10

63%

Business, Management and Accounting 4

25%

Social Sciences 1

Engineering 1

The impact of data re-sampling on learning performance of class imbalanced bankruptcy prediction models

Abstract

Author supplied keywords

References Powered by Scopus

Random forests

LIBSVM: A Library for support vector machines

SMOTE: Synthetic minority over-sampling technique

Cited by Powered by Scopus

Hybrid preprocessing method for support vector machine for classification of imbalanced cerebral infarction datasets

Systematic Review of Financial Distress Identification using Artificial Intelligence Methods

Data Sampling Strategies for Click Fraud Detection Using Imbalanced User Click Data of Online Advertising: An Empirical Review

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline