The complexity of some pattern problems in the logical analysis of large genomic data sets

3Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Many biomedical experiments produce large data sets in the form of binary matrices, with features labeling the columns and individuals (samples) associated to the rows. An important case is when the rows are also labeled into two groups, namely the positive (or healthy) and the negative (or diseased) samples. The Logical Analysis of Data (LAD) is a procedure aimed at identifying relevant features and building boolean formulas (rules) which can be used to classify new samples as positive or negative. These rules are said to explain the data set. Each rule can be represented by a string over {0,1,-}, called a pattern. A data set can be explained by alternative sets of patterns, and many computational problems arise related to the choice of a particular set of patterns for a given instance. In this paper we study the computational complexity of these pattern problems and show that they are, in general, very hard. We give an integer programming formulation for the problem of determining if two sets of patterns are equivalent. We also prove computational complexity results which imply that there should be no simple ILP model for finding a minimal set of patterns explaining a given data set.

Cite

CITATION STYLE

APA

Lancia, G., & Serafini, P. (2016). The complexity of some pattern problems in the logical analysis of large genomic data sets. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9656, pp. 3–12). Springer Verlag. https://doi.org/10.1007/978-3-319-31744-1_1

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free