pLM-BLAST: distant homology detection based on direct comparison of sequence representations from protein language models

21Citations
Citations of this article
32Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Motivation: The detection of homology through sequence comparison is a typical first step in the study of protein function and evolution. In this work, we explore the applicability of protein language models to this task. Results: We introduce pLM-BLAST, a tool inspired by BLAST, that detects distant homology by comparing single-sequence representations (embeddings) derived from a protein language model, ProtT5. Our benchmarks reveal that pLM-BLAST maintains a level of accuracy on par with HHsearch for both highly similar sequences (with >50% identity) and markedly divergent sequences (with <30% identity), while being significantly faster. Additionally, pLM-BLAST stands out among other embedding-based tools due to its ability to compute local alignments. We show that these local alignments, produced by pLM-BLAST, often connect highly divergent proteins, thereby highlighting its potential to uncover previously undiscovered homologous relationships and improve protein annotation.

References Powered by Scopus

Gapped BLAST and PSI-BLAST: A new generation of protein database search programs

63197Citations
N/AReaders
Get full text

Highly accurate protein structure prediction with AlphaFold

21471Citations
N/AReaders
Get full text

A general method applicable to the search for similarities in the amino acid sequence of two proteins

8649Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Uncovering new families and folds in the natural protein universe

41Citations
N/AReaders
Get full text

PLMSearch: Protein language model powers accurate and fast sequence search for remote homology

16Citations
N/AReaders
Get full text

Embedding-based alignment: combining protein language models with dynamic programming alignment to detect structural similarities in the twilight-zone

12Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Kaminski, K., Ludwiczak, J., Pawlicki, K., Alva, V., & Dunin-Horkawicz, S. (2023). pLM-BLAST: distant homology detection based on direct comparison of sequence representations from protein language models. Bioinformatics, 39(10). https://doi.org/10.1093/bioinformatics/btad579

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 9

69%

Researcher 3

23%

Professor / Associate Prof. 1

8%

Readers' Discipline

Tooltip

Biochemistry, Genetics and Molecular Bi... 5

42%

Immunology and Microbiology 4

33%

Agricultural and Biological Sciences 2

17%

Computer Science 1

8%

Save time finding and organizing research with Mendeley

Sign up for free