The impact of data re-sampling on learning performance of class imbalanced bankruptcy prediction models

9Citations
Citations of this article
26Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The aim of this paper is to evaluate the effect of data sampling techniques on the performance of learners using real highly imbalanced Spanish bankruptcy dataset. The class imbalance problem refers to the highly uneven distribution of class instances where one class is having most of the instances than others. In the presence of highly skewed data distribution, the performance of classical learners is heavily biased in recognizing the majority class and consequently leads to the performance degradation of quantitative classifier or predictors models. In this paper, six sampling methods such as synthetic minority oversampling technique (SMOTE), Borderline-SMOTE, Safe-level-SMOTE, Random under sampling, random oversampling and condensed nearest neighbor are used with a different individual(SVM, C4.5, and Logistic regression) and ensemble learners(AdaBoostM1, DTBagging, and Random Forests). The different quantitative prediction models are designed by combination data sampling techniques and classical learners. The performance of quantitative prediction models are evaluated using G-Mean and area under the curve (AUC) measures on the real highly imbalanced data set. The result suggest that the performance of oversampling (with LR and DTBagging) and undersampling (with C4.5 and RF) methods are superior as compare to others on this data set.

References Powered by Scopus

Random forests

94710Citations
N/AReaders
Get full text

LIBSVM: A Library for support vector machines

28071Citations
N/AReaders
Get full text

SMOTE: Synthetic minority over-sampling technique

22381Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Hybrid preprocessing method for support vector machine for classification of imbalanced cerebral infarction datasets

23Citations
N/AReaders
Get full text

Systematic Review of Financial Distress Identification using Artificial Intelligence Methods

22Citations
N/AReaders
Get full text

Data Sampling Strategies for Click Fraud Detection Using Imbalanced User Click Data of Online Advertising: An Empirical Review

20Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Sisodia, D. S., & Verma, U. (2018). The impact of data re-sampling on learning performance of class imbalanced bankruptcy prediction models. International Journal on Electrical Engineering and Informatics, 10(3), 433–446. https://doi.org/10.15676/ijeei.2018.10.3.2

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 12

75%

Lecturer / Post doc 3

19%

Professor / Associate Prof. 1

6%

Readers' Discipline

Tooltip

Computer Science 10

63%

Business, Management and Accounting 4

25%

Social Sciences 1

6%

Engineering 1

6%

Save time finding and organizing research with Mendeley

Sign up for free