AdaNovo: Adaptive De Novo Peptide Sequencing with Conditional Mutual Information
Abstract
AdaNovo, a deep learning framework using conditional mutual information, enhances peptide sequencing accuracy in tandem mass spectrometry, especially for amino acids with post-translational modifications and noisy spectra.
Tandem mass spectrometry has played a pivotal role in advancing proteomics, enabling the analysis of protein composition in biological samples. Despite the development of various deep learning methods for identifying amino acid sequences (peptides) responsible for observed spectra, challenges persist in de novo peptide sequencing. Firstly, prior methods struggle to identify amino acids with post-translational modifications (PTMs) due to their lower frequency in training data compared to canonical amino acids, further resulting in decreased peptide-level identification precision. Secondly, diverse types of noise and missing peaks in mass spectra reduce the reliability of training data (peptide-spectrum matches, PSMs). To address these challenges, we propose AdaNovo, a novel framework that calculates conditional mutual information (CMI) between the spectrum and each amino acid/peptide, using CMI for adaptive model training. Extensive experiments demonstrate AdaNovo's state-of-the-art performance on a 9-species benchmark, where the peptides in the training set are almost completely disjoint from the peptides of the test sets. Moreover, AdaNovo excels in identifying amino acids with PTMs and exhibits robustness against data noise. The supplementary materials contain the official code.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Pep2Prob Benchmark: Predicting Fragment Ion Probability for MS$^2$-based Proteomics (2025)
- Diffusion Decoding for Peptide De Novo Sequencing (2025)
- Deep Learning Model for Amyloidogenicity Prediction using a Pre-trained Protein LLM (2025)
- Continued domain-specific pre-training of protein language models for pMHC-I binding prediction (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper