What is a consistent glycan composition dataset?

  • Saba F
  • Mariethoz J
  • Lisacek F
N/ACitations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Introduction: One of the main challenges in bioinformatics has been and still is, the comparison of entities through the development of algorithms for similarity scoring and data clustering according to biologically relevant aspects. Glycoinformatics also faces this challenge, in particular regarding the automated comparison of protein and/or tissue glycomes, that remains a relatively uncharted territory. Methods: Low and high throughput experimental glycomic and glycoproteomic results were collected, revealing a bias toward N-linked glycomes. Then, N-glycomes were considered and represented as networks of related glycan compositions as opposed to lists of glycans. They were processed and compared through a java application generating graphs and another producing a similarity matrix based on graph content. Several scoring schemes (e.g., Jaccard index or cosine) were tested and evaluated using the Matthews Correlation Coefficient, in order to capture a meaningful protein and tissue N-glycome similarity. Results: Assuming that a glycome corresponds to a well-connected graph of glycan compositions, graph comparison has revealed gaps that can be interpreted as inconsistencies. The outcome of systematic graph comparison is both formal and practical. In principle, it is shown that the idiosyncrasy of current glycome data limits the definition of appropriate estimates for systematically comparing N-glycomes. Yet, several potentially interesting criteria could be identified in a series of use cases detailed in the study. Discussion: Differentially expressed glycomes are usually compared manually, but the resulting work tends to remain in publications due to the lack of dedicated tools. Even manually, cross-comparison is challenging mostly because different sets of features are used from one study to the other. The work presented here enables laying down guidelines for developing a software tool comparing glycomes based on appropriate definitions of similarity and suitable methods for its evaluation and implementation.

Cite

CITATION STYLE

APA

Saba, F., Mariethoz, J., & Lisacek, F. (2023). What is a consistent glycan composition dataset? Frontiers in Analytical Science, 3. https://doi.org/10.3389/frans.2023.1073540

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free