The complexity of some pattern problems in the logical analysis of large genomic data sets

Giuseppe Lancia; Paolo Serafini

Conference Proceedings

The complexity of some pattern problems in the logical analysis of large genomic data sets

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2016) 9656 3-12

DOI: 10.1007/978-3-319-31744-1_1

3Citations

5Readers

Get full text

Abstract

Many biomedical experiments produce large data sets in the form of binary matrices, with features labeling the columns and individuals (samples) associated to the rows. An important case is when the rows are also labeled into two groups, namely the positive (or healthy) and the negative (or diseased) samples. The Logical Analysis of Data (LAD) is a procedure aimed at identifying relevant features and building boolean formulas (rules) which can be used to classify new samples as positive or negative. These rules are said to explain the data set. Each rule can be represented by a string over {0,1,-}, called a pattern. A data set can be explained by alternative sets of patterns, and many computational problems arise related to the choice of a particular set of patterns for a given instance. In this paper we study the computational complexity of these pattern problems and show that they are, in general, very hard. We give an integer programming formulation for the problem of determining if two sets of patterns are equivalent. We also prove computational complexity results which imply that there should be no simple ILP model for finding a minimal set of patterns explaining a given data set.

Cite

CITATION STYLE

APA

Lancia, G., & Serafini, P. (2016). The complexity of some pattern problems in the logical analysis of large genomic data sets. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9656, pp. 3–12). Springer Verlag. https://doi.org/10.1007/978-3-319-31744-1_1

The complexity of some pattern problems in the logical analysis of large genomic data sets

Abstract

Cite

Register to see more suggestions