JESTR: Joint Embedding Space Technique for Ranking candidate molecules for the annotation of untargeted metabolomics data.
Kalia, Apurva
Zhou Chen, Yan
Krishnan, Dilip
Hassoun, Soha
2025
-
A major challenge in metabolomics is annotation: assigning molecular structures to mass spectral fragmentation patterns. Despite recent advances in molecule-to-spectra and in spectra-to-molecular fingerprint (FP) prediction, annotation rates remain low.We introduce in this article a novel tool (JESTR) for annotation. Unlike prior approaches that "explicitly" construct molecular FPs or spectra, ... read moreJESTR leverages the insight that molecules and their corresponding spectra are views of the same data and effectively embeds their representations in a joint space. Candidate structures are ranked based on cosine similarity between the embeddings of query spectrum and each candidate. We evaluate JESTR against mol-to-spec, spec-to-FP, and spec-mol matching annotation tools on four datasets. On average, for rank@[1-20], JESTR outperforms other tools by 55.5%-302.6%. We further demonstrate the strong value of regularization with candidate molecules during training, boosting rank@1 performance by 5.72% across all datasets and enhancing the model's ability to discern between target and candidate molecules. When comparing JESTR's performance against that of publicly available pretrained models of SIRIUS and CFM-ID on appropriate subsets of MassSpecGym dataset, JESTR outperforms these tools by 31% and 238%, respectively. Through JESTR, we offer a novel promising avenue toward accurate annotation, therefore unlocking valuable insights into the metabolome.
Funding for open access publishing provided by the Tisch Library Open Access Publishing Fund.read less - Apurva Kalia, Yan Zhou Chen, Dilip Krishnan, Soha Hassoun, JESTR: Joint Embedding Space Technique for Ranking candidate molecules for the annotation of untargeted metabolomics data, Bioinformatics, Volume 41, Issue 7, July 2025, btaf354, https://doi.org/10.1093/bioinformatics/btaf354
- ID:
- sn00bc67g
- To Cite:
- TARC Citation Guide EndNote
- Usage:
- Detailed Rights
- DOI:
- https://doi.org/10.1093/bioinformatics/btaf354
