Non-negative matrix factorization for learning alignment-specific models of protein evolution

10Citations
Citations of this article
42Readers
Mendeley users who have this article in their library.

Abstract

Models of protein evolution currently come in two flavors: generalist and specialist. Generalist models (e.g. PAM, JTT, WAG) adopt a one-size-fits-all approach, where a single model is estimated from a number of different protein alignments. Specialist models (e.g. mtREV, rtREV, HIVbetween) can be estimated when a large quantity of data are available for a single organism or gene, and are intended for use on that organism or gene only. Unsurprisingly, specialist models outperform generalist models, but in most instances there simply are not enough data available to estimate them. We propose a method for estimating alignment-specific models of protein evolution in which the complexity of the model is adapted to suit the richness of the data. Our method uses non-negative matrix factorization (NNMF) to learn a set of basis matrices from a general dataset containing a large number of alignments of different proteins, thus capturing the dimensions of important variation. It then learns a set of weights that are specific to the organism or gene of interest and for which only a smaller dataset is available. Thus the alignment-specific model is obtained as a weighted sum of the basis matrices. Having been constrained to vary along only as many dimensions as the data justify, the model has far fewer parameters than would be required to estimate a specialist model. We show that our NNMF procedure produces models that outperform existing methods on all but one of 50 test alignments. The basis matrices we obtain confirm the expectation that amino acid properties tend to be conserved, and allow us to quantify, on specific alignments, how the strength of conservation varies across different properties. We also apply our new models to phylogeny inference and show that the resulting phylogenies are different from, and have improved likelihood over, those inferred under standard models. © 2011 Murrell et al.

Figures

  • Figure 1. Non-negative matrix factorization.
  • Table 1. Interpretation of the matrix factorization in Figure 1.
  • Figure 2. Learning models of protein evolution with NNMF. A schematic overview of the procedure. doi:10.1371/journal.pone.0028898.g002
  • Figure 3. Selecting the larger Pandit alignments. Each blue dot represents an alignment in the Pandit database. The green region covers the alignments used in the training set, and the thin red region covers those in the test set. doi:10.1371/journal.pone.0028898.g003
  • Figure 4. Convergence of NNMF. The sum of squared error decreases as more basis matrices are included. doi:10.1371/journal.pone.0028898.g004
  • Figure 5. NNMF basis matrices. The set of NNMF basis matrices obtained for ranks ranging from 1 to 5. Amino acids are ordered according to their Stanfel classification [25]. Rates are indicated in grayscale, with pure white being a rate of zero and pure black being the maximum rate in the matrix. doi:10.1371/journal.pone.0028898.g005
  • Figure 6. NNMF basis matrices correlate with amino acid properties. The correlations between amino acid properties and the basis matrices. The horizontal black line (at 20.16867) indicates the threshold for significant negative correlation (pv0:01, one tailed, n~190). doi:10.1371/journal.pone.0028898.g006
  • Figure 7. Distribution of the optimal number of basis matrices.

References Powered by Scopus

MODELTEST: Testing the model of DNA substitution

18541Citations
N/AReaders
Get full text

New algorithms and methods to estimate maximum-likelihood phylogenies: Assessing the performance of PhyML 3.0

14278Citations
N/AReaders
Get full text

Evolutionary trees from DNA sequences: A maximum likelihood approach

12251Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Gene-wide identification of episodic selection

379Citations
N/AReaders
Get full text

Discriminant projective non-negative matrix factorization

24Citations
N/AReaders
Get full text

Improving phylogenetic inference with a semiempirical amino acid substitution model

21Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Murrell, B., Weighill, T., Buys, J., Ketteringham, R., Moola, S., Benade, G., … Scheffler, K. (2011). Non-negative matrix factorization for learning alignment-specific models of protein evolution. PLoS ONE, 6(12). https://doi.org/10.1371/journal.pone.0028898

Readers over time

‘11‘12‘13‘14‘15‘16‘17‘18‘19‘20‘21‘22‘24‘25036912

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 17

55%

Professor / Associate Prof. 7

23%

Researcher 6

19%

Lecturer / Post doc 1

3%

Readers' Discipline

Tooltip

Agricultural and Biological Sciences 8

30%

Biochemistry, Genetics and Molecular Bi... 8

30%

Computer Science 7

26%

Engineering 4

15%

Article Metrics

Tooltip
Mentions
References: 2

Save time finding and organizing research with Mendeley

Sign up for free
0