Supervised pre-processing of numerical variables for multi-relational data mining

2Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In Multi-RelationalDataMining (MRDM), data are represented in a relational form where the individuals of the target table are potentially related to several records in secondary tables in one-to-many relationship. Variable pre-processing (including discretization and feature selection) within this multiple table setting differs from the attribute-value case. Besides the target variable information, one should take into account the relational structure of the database. In this paper, we focus on numerical variables located in a non target table. We propose a criterion that evaluates a given discretization of such variables. The idea is to summarize for each individual the information contained in the secondary variable by a feature tuple (one feature per interval of the considered discretization). Each feature represents the number of values of the secondary variable ranging in the corresponding interval. These count features are jointly partitioned by means of data grid models in order to obtain the best separation of the class values. We describe a simple optimization algorithm to find the best equal frequency discretization with respect to the proposed criterion. Experiments on a real and artificial data sets reveal that the discretization approach helps one to discover relevant secondary variables. © 2014 Springer International Publishing Switzerland.

Cite

CITATION STYLE

APA

Lahbib, D., Boullé, M., & Laurent, D. (2014). Supervised pre-processing of numerical variables for multi-relational data mining. Studies in Computational Intelligence, 527, 95–109. https://doi.org/10.1007/978-3-319-02999-3_6

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free