Gene selection for sample classification based on gene expression data: Study of sensitivity to choice of parameters of the GA/KNN method

544Citations
Citations of this article
144Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Motivation: We recently introduced a multivariate approach that selects a subset of predictive genes jointly for sample classification based on expression data. We tested the algorithm on colon and leukemia data sets. As an extension to our earlier work, we systematically examine the sensitivity, reproducibility and stability of gene selection/sample classification to the choice of parameters of the algorithm. Methods: Our approach combines a Genetic Algorithm (GA) and the κ-Nearest Neighbor (KNN) method to identify genes that can jointly discriminate between different classes of samples (e.g. normal versus tumor). The GA/KNN method is a stochastic supervised pattern recognition method. The genes identified are subsequently used to classify independent test set samples. Results: The GA/KNN method is capable of selecting a subset of predictive genes from a large noisy data set for sample classification. It is a multivariate approach that can capture the correlated structure in the data. We find that for a given data set gene selection is highly repeatable in independent runs using the GA/KNN method. In general, however, gene selection may be less robust than classification.

References Powered by Scopus

Cluster analysis and display of genome-wide expression patterns

13641Citations
N/AReaders
Get full text

A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting

13362Citations
N/AReaders
Get full text

Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring

9638Citations
N/AReaders
Get full text

Cited by Powered by Scopus

A review of feature selection techniques in bioinformatics

4132Citations
N/AReaders
Get full text

Text classification algorithms: A survey

1173Citations
N/AReaders
Get full text

Cluster analysis for gene expression data: A survey

993Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Li, L., Weinberg, C. R., Darden, T. A., & Pedersen, L. G. (2002). Gene selection for sample classification based on gene expression data: Study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics, 17(12), 1131–1142. https://doi.org/10.1093/bioinformatics/17.12.1131

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 65

62%

Researcher 22

21%

Professor / Associate Prof. 12

11%

Lecturer / Post doc 6

6%

Readers' Discipline

Tooltip

Computer Science 48

54%

Agricultural and Biological Sciences 20

22%

Engineering 14

16%

Biochemistry, Genetics and Molecular Bi... 7

8%

Save time finding and organizing research with Mendeley

Sign up for free