Assessing Different Feature Selection Methods Applied to a Bulk RNA Sequencing Dataset with Regard to Biomedical Relevance

0Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

High throughput RNA sequencing (RNA-Seq) allows for the profiling of thousands of transcripts in multiple samples. For the analysis of the generated RNA-Seq datasets, standard and well-established methods exist, which are however limited by (i) the high dimensionality of the data with most of the expression profiles being uninformative, and (ii) by an imbalanced sample-to-feature ratio. This complicates downstream analyses of these data, and the implementation of methods such as Machine Learning (ML) classification. Therefore, the selection of those features that carry the essential information is important. The standard method of informative feature selection is gene expression (DGE) analysis, which is often conducted in a univariate fashion, and ignores interactions between expression profiles. ML-based feature selection methods, on the other hand, are capable of addressing these shortcomings. Here, we have applied five different ML-based feature selection methods, and conventional DGE analysis to a high-dimensional bulk RNA-Seq dataset of PBMCs of healthy children and of children affected with Atopic Dermatitis (AD), and evaluated the resulting feature lists. The similarities between the feature lists were assessed with three similarity coefficients. The selected genetic features were subjected to a Gene Ontology (GO) functional enrichment analysis, and the significantly enriched GO terms were evaluated applying a semantic similarity analysis combined with binary cut clustering. In addition, comparisons with consensus gene lists associated with AD were performed, and the previous identification of the selected features in related studies was assessed. We found that genetic features selected with ML-based methods, in general, were of higher biomedical relevance. We argue that ML-based feature selection followed by a careful evaluation of the selected feature sets extend the possibilities of precision medicine to discover biomarkers.

Cite

CITATION STYLE

APA

Zhakparov, D., Moriarty, K., Lunjani, N., Schmid, M., Hlela, C., Levin, M., … Roqueiro, D. (2023). Assessing Different Feature Selection Methods Applied to a Bulk RNA Sequencing Dataset with Regard to Biomedical Relevance. In Communications in Computer and Information Science (Vol. 1753 CCIS, pp. 259–274). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-23633-4_18

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free