Interpretation of personal genome sequencing data in terms of disease ranks based on mutual information

5Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Background: The rapid advances in genome sequencing technologies have resulted in an unprecedented number of genome variations being discovered in humans. However, there has been very limited coverage of interpretation of the personal genome sequencing data in terms of diseases. Methods: In this paper we present the first computational analysis scheme for interpreting personal genome data by simultaneously considering the functional impact of damaging variants and curated disease-gene association data. This method is based on mutual information as a measure of the relative closeness between the personal genome and diseases. We hypothesize that a higher mutual information score implies that the personal genome is more susceptible to a particular disease than other diseases. Results: The method was applied to the sequencing data of 50 acute myeloid leukemia (AML) patients in The Cancer Genome Atlas. The utility of associations between a disease and the personal genome was explored using data of healthy (control) people obtained from the 1000 Genomes Project. The ranks of the disease terms in the AML patient group were compared with those in the healthy control group using "Leukemia, Myeloid, Acute" (C04.557.337.539.550) as the corresponding MeSH disease term. The mutual information rank of the disease term was substantially higher in the AML patient group than in the healthy control group, which demonstrates that the proposed methodology can be successfully applied to infer associations between the personal genome and diseases. Conclusions: Overall, the area under the receiver operating characteristics curve was significantly larger for the AML patient data than for the healthy controls. This methodology could contribute to consequential discoveries and explanations for mining personal genome sequencing data in terms of diseases, and have versatility with respect to genomic-based knowledge such as drug-gene and environmental-factor-gene interactions.

References Powered by Scopus

An integrated map of genetic variation from 1,092 human genomes

6123Citations
N/AReaders
Get full text

Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm

5445Citations
N/AReaders
Get full text

Accurate whole human genome sequencing using reversible terminator chemistry

2797Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Computational intelligence techniques for medical diagnosis and prognosis: Problems and current developments

43Citations
N/AReaders
Get full text

The application of information theory for the research of aging and aging-related diseases

21Citations
N/AReaders
Get full text

Inferring Crohn's disease association from exome sequences by integrating biological knowledge

10Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Na, Y. J., Sohn, K. A., & Kim, J. H. (2015). Interpretation of personal genome sequencing data in terms of disease ranks based on mutual information. BMC Medical Genomics, 8(2). https://doi.org/10.1186/1755-8794-8-S2-S4

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 5

56%

Researcher 2

22%

Professor / Associate Prof. 1

11%

Lecturer / Post doc 1

11%

Readers' Discipline

Tooltip

Biochemistry, Genetics and Molecular Bi... 3

43%

Agricultural and Biological Sciences 2

29%

Computer Science 1

14%

Nursing and Health Professions 1

14%

Save time finding and organizing research with Mendeley

Sign up for free