Mannochio-Russo, H., Charron-Lamoureux, V., van Faassen, M., Lamichhane, S., Gonçalves Nunes, W. D., Deleray, V., Ayala, A. V., Tanaka, Y., Patan, A., Vittali, K., Rajkumar, P., El Abiead, Y., Zhao, H. N., Gomes, P. W. P., Mohanty, I., Lee, C., Sund, A., Sharma, M., Liu, Y., … Dorrestein, P. C. (2025). The microbiome diversifies long- to short-chain fatty acid-derived N-acyl lipids. Cell, 188(15), 4154–4169.e19. https://doi.org/10.1016/j.cell.2025.05.015
@article{MannochioRusso2025_1,
author = {Mannochio-Russo, Helena and Charron-Lamoureux, Vincent and van Faassen, Martijn and Lamichhane, Santosh and Gon{\c{c}}alves Nunes, Wilhan D. and Deleray, Victoria and Ayala, Adriana V. and Tanaka, Yuichiro and Patan, Abubaker and Vittali, Kyle and Rajkumar, Prajit and El Abiead, Yasin and Zhao, Haoqi Nina and Gomes, Paulo Wender Portal and Mohanty, Ipsita and Lee, Carlynda and Sund, Aidan and Sharma, Meera and Liu, Yuanhao and Pattynama, David and Walker, Gregory T. and Norton, Grant J. and Khatib, Lora and Andalibi, Mohammadsobhan S. and Wang, Crystal X. and Ellis, Ronald J. and Moore, David J. and Iudicello, Jennifer E. and Franklin Jr., Donald and Letendre, Scott and Chin, Loryn and Walker, Corinn and Renwick, Simone and Zemlin, Jasmine and Meehan, Michael J. and Song, Xinyang and Kasper, Dennis and Burcham, Zachary and Kim, Jane J. and Kadakia, Sejal and Raffatellu, Manuela and Bode, Lars and Chu, Hiutung and Zengler, Karsten and Wang, Mingxun and Siegel, Dionicio and Knight, Rob and Dorrestein, Pieter C.},
title = {The microbiome diversifies long- to short-chain fatty acid-derived N-acyl lipids},
journal = {Cell},
year = {2025},
month = jul,
day = {24},
publisher = {Elsevier},
volume = {188},
number = {15},
pages = {4154-4169.e19},
issn = {0092-8674},
doi = {10.1016/j.cell.2025.05.015},
url = {https://doi.org/10.1016/j.cell.2025.05.015}
}
N-Acyl lipids are important mediators of several biological processes including immune function and stress response. To enhance the detection of N-acyl lipids with untargeted mass spectrometry-based metabolomics, we created a reference spectral library retrieving N-acyl lipid patterns from 2,700 public datasets, identifying 851 N-acyl lipids that were detected 356,542 times. 777 are not documented in lipid structural databases, with 18% of these derived from short-chain fatty acids and found in the digestive tract and other organs. Their levels varied with diet and microbial colonization and in people living with diabetes. We used the library to link microbial N-acyl lipids, including histamine and polyamine conjugates, to HIV status and cognitive impairment. This resource will enhance the annotation of these compounds in future studies to further the understanding of their roles in health and disease and to highlight the value of large-scale untargeted metabolomics data for metabolite discovery.
Charron-Lamoureux, V., Mannochio-Russo, H., Lamichhane, S., Xing, S., Patan, A., Portal Gomes, P. W., Rajkumar, P., Deleray, V., Caraballo-Rodríguez, A. M., Chua, K. V., Lee, L. S., Liu, Z., Ching, J., Wang, M., & Dorrestein, P. C. (2025). A guide to reverse metabolomics—a framework for big data discovery strategy. Nature Protocols. https://doi.org/10.1038/s41596-024-01136-2
@article{CharronLamoureux2025,
author = {Charron-Lamoureux, Vincent and Mannochio-Russo, Helena and Lamichhane, Santosh and Xing, Shipei and Patan, Abubaker and Portal Gomes, Paulo Wender and Rajkumar, Prajit and Deleray, Victoria and Caraballo-Rodr{\'i}guez, Andr{\'e}s Mauricio and Chua, Kee Voon and Lee, Lye Siang and Liu, Zhao and Ching, Jianhong and Wang, Mingxun and Dorrestein, Pieter C.},
title = {A guide to reverse metabolomics---a framework for big data discovery strategy},
journal = {Nature Protocols},
year = {2025},
month = feb,
day = {28},
issn = {1750-2799},
doi = {10.1038/s41596-024-01136-2},
url = {https://doi.org/10.1038/s41596-024-01136-2}
}
Untargeted metabolomics is evolving into a field of big data science. There is a growing interest within the metabolomics community in mining tandem mass spectrometry (MS/MS)-based data from public repositories. In traditional untargeted metabolomics, samples to address a predefined question are collected and liquid chromatography with MS/MS data are generated. We then identify metabolites associated with a phenotype (for example, disease versus healthy) and elucidate or validate their structural details (for example, molecular formula, structural classification, substructure or complete structural annotation or identification). In reverse metabolomics, we start with MS/MS spectra for known or unknown molecules. These spectra are used as search terms to search public data repositories to discover phenotype-relevant information such as organ/biofluid distribution, disease condition, intervention status (for example, pre- and postintervention), organisms (for example, mammals versus others), geography and any other biologically relevant associations. Here we guide the reader through a four-part process: (1) obtaining the MS/MS spectra of interest (Universal Spectrum Identifier) and (2) Mass Spectrometry Search Tool searches to find the files associated with the MS/MS that are in available databases, (3) using the Reanalysis Data User Interface framework to link the files with their metadata and (4) validating the observations. Parts 1–3 could take from hours to days depending on the method used for collecting MS/MS spectra. For example, we use MS/MS spectra from three small molecules: phenylalanine-cholic acid (a microbially conjugated bile acid), phenylalanine-C4:0 and histidine-C4:0 (two N-acyl amides). We leverage the Global Natural Products Social Molecular Networking-based framework to explore the microbial producers of these molecules and their associations with health conditions and organ distributions in humans and rodents.
Krutkin, D. D., Thomas, S., Zuffa, S., Rajkumar, P., Knight, R., Dorrestein, P. C., & Kelley, S. T. (2025). To Impute or Not To Impute in Untargeted Metabolomics─That is the Compositional Question. Journal of the American Society for Mass Spectrometry. https://doi.org/10.1021/jasms.4c00434
@article{Krutkin2025,
author = {Krutkin, Dennis D. and Thomas, Sydney and Zuffa, Simone and Rajkumar, Prajit and Knight, Rob and Dorrestein, Pieter C. and Kelley, Scott T.},
title = {To Impute or Not To Impute in Untargeted Metabolomics─That is the Compositional Question},
journal = {Journal of the American Society for Mass Spectrometry},
year = {2025},
month = feb,
day = {25},
publisher = {American Society for Mass Spectrometry. Published by the American Chemical Society. All rights reserved.},
issn = {1044-0305},
doi = {10.1021/jasms.4c00434},
url = {https://doi.org/10.1021/jasms.4c00434}
}
Untargeted metabolomics often produce large datasets with missing values. These missing values are derived from biological or technical factors and can undermine statistical analyses and lead to biased biological interpretations. Imputation methods, such as k-Nearest Neighbors (kNN) and Random Forest (RF) regression, are commonly used, but their effects vary depending on the type of missing data, e.g., Missing Completely At Random (MCAR) and Missing Not At Random (MNAR). Here, we determined the impacts of degree and type of missing data on the accuracy of kNN and RF imputation using two datasets: a targeted metabolomic dataset with spiked-in standards and an untargeted metabolomic dataset. We also assessed the effect of compositional data approaches (CoDA), such as the centered log-ratio (CLR) transform, on data interpretation since these methods are increasingly being used in metabolomics. Overall, we found that kNN and RF performed more accurately when the proportion of missing data across samples for a metabolic feature was low. However, these imputations could not handle MNAR data and generated wildly inflated or imputed values where none should exist. Furthermore, we show that the proportion of missing values had a strong impact on the accuracy of imputation, which affected the interpretation of the results. Our results suggest imputation should be used with extreme caution even with modest levels of missing data and especially when the type of missingness is unknown.
preprints
Patan, A., Xing, S., Charron-Lamoureux, V., Hu, Z., Deleray, V., Agongo, J., El Abiead, Y., Mannochio-Russo, H., Mohanty, I., Gouda, H., Zemlin, J., Rajkumar, P., Lee, C., Leanos, D., Weimann, N., Tsuda, W., Giddings, S., Bui, T., Kvitne, K. E., … Dorrestein, P. C. (2025). Charting the undiscovered metabolome with synthetic
multiplexing. BioRxiv.
@article{Patan2025,
title = {Charting the undiscovered metabolome with synthetic
multiplexing},
author = {Patan, Abubaker and Xing, Shipei and Charron-Lamoureux, Vincent and Hu, Zhewen and Deleray, Victoria and Agongo, Julius and El Abiead, Yasin and Mannochio-Russo, Helena and Mohanty, Ipsita and Gouda, Harsha and Zemlin, Jasmine and Rajkumar, Prajit and Lee, Carlynda and Leanos, Daniel and Weimann, Noah and Tsuda, Wataru and Giddings, Sadie and Bui, Tammy and Kvitne, Kine Eide and Zhao, Haoqi Nina and Zuffa, Simone and Nguyen, Vivian and Andrade, Aileen and Gon{\c
c}alves Nunes, Wilhan Donizete and Caraballo-Rodr{\'\i}guez, Andr{\'e}s M and Caetano David, Lurian and Carver, Jeremy and Bandeira, Nuno and Wang, Mingxun and Burnett, Lindsey A and Siegel, Dionicio and Dorrestein, Pieter C},
journal = {bioRxiv},
publisher = {Cold Spring Harbor Laboratory},
month = nov,
doi = {10.1101/2025.11.18.689170},
year = {2025},
copyright = {https://www.biorxiv.org/about/FAQ\#license}
}
In untargeted metabolomics, reference MS/MS libraries
are essential for structural annotation, yet currently explain
only 6.9% of the more than 1.7 billion MS/MS spectra in
public repositories. We hypothesized that many unannotated
features arise from simple, biologically plausible
transformations of endogenous and exposure-derived compounds.
To test this, we created a reference resource by synthesizing
over 100,000 compounds using multiplexed reactions that mimic
such biochemical transformations. 91% of the compounds
synthesized are absent from existing structural databases.
Through improvements in the construction of the computational
infrastructure that enables pan repository-scale MS/MS
comparisons, searching this biologically inspired MS/MS
library increased the overall reference-based match rate by
17.4%, yielding over 60 million new matches and raising the
global pan-repository MS/MS annotation rate to 8.1%. By
facilitating structural hypotheses for previously
uncharacterized MS/MS data, this framework expands the
accessible detectable biochemical landscape across human,
animal, plant, and microbial systems, revealing previously
undescribed metabolites such as ibuprofen-carnitine and
5-ASA-phenylpropionic acid conjugates arising from drug–host
and host–microbiome co-metabolism.
Mannochio-Russo, H., Gonçalves Nunes, W. D., Zhao, H. N., Kvitne, K. E., Xing, S., Gouda, H., Agongo, J., Mohanty, I., Charron-Lamoureux, V., Rajkumar, P., Pakkir Shah, A. K., Walter, A., Krishnaraj, R., El Abiead, Y., Ferreira, P. C., Zuffa, S., Patan, A., Caraballo-Rodrı́guez Andrés Mauricio, Bittremieux, W., … Dorrestein, P. (2025). Bridging complexity and accessibility in metabolomics with
MetaboApps. ChemRxiv. https://chemrxiv.org/engage/chemrxiv/article-details/68e5680fdfd0d042d15c4900
@article{MannochioRusso2025_2,
title = {Bridging complexity and accessibility in metabolomics with
MetaboApps},
author = {Mannochio-Russo, Helena and Gon{\c c}alves Nunes, Wilhan D and Zhao, Haoqi Nina and Kvitne, Kine Eide and Xing, Shipei and Gouda, Harsha and Agongo, Julius and Mohanty, Ipsita and Charron-Lamoureux, Vincent and Rajkumar, Prajit and Pakkir Shah, Abzer K and Walter, Axel and Krishnaraj, Rithi and El Abiead, Yasin and Ferreira, Patrick C and Zuffa, Simone and Patan, Abubaker and Caraballo-Rodr{\'\i}guez, Andr{\'e}s Mauricio and Bittremieux, Wout and Petras, Daniel and Wang, Mingxun and Dorrestein, Pieter},
journal = {ChemRxiv},
year = {2025},
url = {https://chemrxiv.org/engage/chemrxiv/article-details/68e5680fdfd0d042d15c4900},
doi = {10.26434/chemrxiv-2025-3nq29},
publisher = {American Chemical Society}
}
Untargeted metabolomics is a powerful approach for exploring the
chemical diversity and dynamics of biological systems. However,
the types of questions that can be addressed depend not only on
experimental design but also on the data processing and analysis
workflows employed, many of which require advanced computational
expertise. GNPS1, now transitioning to its second major
implementation (GNPS2), has evolved into an expandable platform
that supports the integration of modular web applications
designed to simplify and enhance downstream analysis. These apps,
named MetaboApps, facilitate the post-processing of outputs of
several GNPS workflows and help make repository-scale
metabolomics knowledge and other areas of metabolomics more
accessible to a broader community.
Zhao, H. N., Kvitne, K. E., Brungs, C., Mohan, S., Charron-Lamoureux, V., Bittremieux, W., Tang, R., Schmid, R., Lamichhane, S., El Abiead, Y., Andalibi, M. S., Mannochio-Russo, H., Ambre, M., Avalon, N. E., Bryant, M. K., Caraballo-Rodrı́guez Andrés Mauricio, Maya, M. C., Chin, L., Ellis, R. J., … Dorrestein, P. C. (2024). Empirically establishing drug exposure records directly from untargeted metabolomics data. BioRxiv. https://www.biorxiv.org/content/early/2024/10/26/2024.10.07.617109
@article{Zhao2024,
author = {Zhao, Haoqi Nina and Kvitne, Kine Eide and Brungs, Corinna and Mohan, Siddharth and Charron-Lamoureux, Vincent and Bittremieux, Wout and Tang, Runbang and Schmid, Robin and Lamichhane, Santosh and El Abiead, Yasin and Andalibi, Mohammadsobhan S. and Mannochio-Russo, Helena and Ambre, Madison and Avalon, Nicole E. and Bryant, MacKenzie and Caraballo-Rodr{\'\i}guez, Andr{\'e}s Mauricio and Maya, Martin Casas and Chin, Loryn and Ellis, Ronald J. and Franklin, Donald and Girod, Sagan and Gomes, Paulo Wender P and Hansen, Lauren and Heaton, Robert and Iudicello, Jennifer E. and Jarmusch, Alan K. and Khatib, Lora and Letendre, Scott and Magyari, Sarolt and McDonald, Daniel and Mohanty, Ipsita and Cumsille, Andr{\'e}s and Moore, David J. and Rajkumar, Prajit and Ross, Dylan H. and Sapre, Harshada and Shahneh, Mohammad Reza Zare and Thomas, Sydney P. and Tribelhorn, Caitlin and Tubb, Helena M. and Walker, Corinn and Wang, Crystal X. and Xing, Shipei and Zemlin, Jasmine and Zuffa, Simone and Wishart, David S. and Kaddurah-Daouk, Rima and Wang, Mingxun and Raffatellu, Manuela and Zengler, Karsten and Pluskal, Tom{\'a}{\v s} and Xu, Libin and Knight, Rob and Tsunoda, Shirley M. and Dorrestein, Pieter C.},
title = {Empirically establishing drug exposure records directly from untargeted metabolomics data},
elocation-id = {2024.10.07.617109},
year = {2024},
doi = {10.1101/2024.10.07.617109},
publisher = {Cold Spring Harbor Laboratory},
url = {https://www.biorxiv.org/content/early/2024/10/26/2024.10.07.617109},
eprint = {https://www.biorxiv.org/content/early/2024/10/26/2024.10.07.617109.full.pdf},
journal = {bioRxiv}
}
Despite extensive efforts, extracting information on medication exposure from clinical records remains challenging. To complement this approach, we developed the tandem mass spectrometry (MS/MS) based GNPS Drug Library. This resource integrates MS/MS data for drugs and their metabolites/analogs with controlled vocabularies on exposure sources, pharmacologic classes, therapeutic indications, and mechanisms of action. It enables direct analysis of drug exposure and metabolism from untargeted metabolomics data independent of clinical records. Our library facilitates stratification of individuals in clinical studies based on the empirically detected medications, exemplified by drug-dependent microbiota-derived N-acyl lipid changes in a cohort with human immunodeficiency virus. The GNPS Drug Library holds potential for broader applications in drug discovery and precision medicine.Competing Interest StatementR.S.: R.S. is a co-founder of mzio GmbH. D.M.: D.M. is a consultant for BiomeSense, Inc., has equity and receives income. The terms of these arrangements have been reviewed and approved by the University of California, San Diego in accordance with its conflict of interest policies. R.K.-D.: R.K.-D. is an inventor on a series of patents on use of metabolomics for the diagnosis and treatment of CNS diseases and holds equity in Metabolon Inc., Chymia LLC and PsyProtix. M.W.: M.W. is a co-founder of Ometa Labs LLC T.P.: T.P. is a co-founder of mzio GmbH. R.K.: R.K. is a scientific advisory board member, and consultant for BiomeSense, Inc., has equity and receives income. He is a scientific advisory board member and has equity in GenCirq. He is a consultant for DayTwo, and receives income. He has equity in and acts as a consultant for Cybele. He is a co-founder of Biota, Inc., and has equity. He is a cofounder of Micronoma, and has equity and is a scientific advisory board member. The terms of these arrangements have been reviewed and approved by the University of California, San Diego in accordance with its conflict of interest policies. S.M.T.: S.M.T. receives research funding from Veloxis Pharmaceuticals. P.C.D.: P.C.D. is a scientific advisor and holds equity in Cybele, and bileOmix, and is a Scientific Co-founder, and advisor and holds equity in Ometa, Arome, and Enveda with prior approval by UC-San Diego.