Dynamic clustering-based estimation of missing values in mixed type data

Vadim V. Ayuyev; Joseph Jupin; Philip W. Harris; Zoran Obradovic

Conference Proceedings

Dynamic clustering-based estimation of missing values in mixed type data

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2009) 5691 LNCS 366-377

DOI: 10.1007/978-3-642-03730-6_29

9Citations

14Readers

Get full text

Abstract

The appropriate choice of a method for imputation of missing data becomes especially important when the fraction of missing values is large and the data are of mixed type. The proposed dynamic clustering imputation (DCI) algorithm relies on similarity information from shared neighbors, where mixed type variables are considered together. When evaluated on a public social science dataset of 46,043 mixed type instances with up to 33% missing values, DCI resulted in more than 20% improved imputation accuracy over Multiple Imputation, Predictive Mean Matching, Linear and Multilevel Regression, and Mean Mode Replacement methods. Data imputed by 6 methods were used for prediction tests by NB-Tree, Random Subset Selection and Neural Network-based classification models. In our experiments classification accuracy obtained using DCI-preprocessed data was much better than when relying on alternative imputation methods for data preprocessing. © 2009 Springer Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Ayuyev, V. V., Jupin, J., Harris, P. W., & Obradovic, Z. (2009). Dynamic clustering-based estimation of missing values in mixed type data. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5691 LNCS, pp. 366–377). https://doi.org/10.1007/978-3-642-03730-6_29

Readers' Seniority

PhD / Post grad / Masters / Doc 10

77%

Researcher 2

15%

Professor / Associate Prof. 1

Readers' Discipline

Computer Science 6

50%

Mathematics 3

25%

Medicine and Dentistry 2

17%

Sports and Recreations 1

Dynamic clustering-based estimation of missing values in mixed type data

Abstract

Author supplied keywords

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline