The role of discretization of continuous variables in socioeconomic classification models on the example of logistic regression models and artificial neural networks

2Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Logistic regression models and artificial neural networks require the use of appropriate quality data. One of the methods of improving the quality of raw data is the discretization of continuous variables. It can be a way to deal with outliers and influential observations and can be helpful when the assumptions of some of the models are not met. This paper shows that despite the fact that the discretization of continuous variables means that reduced information is used for the modeling, it can improve the classification accuracy of machine learning models. This is particularly important when searching for the best predictive model when a limited set of explanatory variables is available, as well as when analyzing large data sets. In addition, by selecting the methods used to discretize continuous variables we decide about the number and type of variables that are included in the model and, as a result, are subject to interpretation. The selection of cut-off points matching the purpose of the research can be made using supervised discretization methods. In this study, the data from the Generations and Gender Survey (GGS) for Poland was used. The status of respondents on the labor market was considered. For the considered data, the advantages of using supervised discretization of continuous variables based on the entropy criterion and the Gini criterion were pointed out. Importantly, discretization based on these methods provided predictive models of better classification accuracy than the models based on discretization procedure frequently applied in socioeconomic studies.

Cite

CITATION STYLE

APA

Grzenda, W. (2020). The role of discretization of continuous variables in socioeconomic classification models on the example of logistic regression models and artificial neural networks. In Studies in Classification, Data Analysis, and Knowledge Organization (pp. 35–51). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-52348-0_3

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free