Abstract The rapid development of high-throughput sequencing techniques provides an unprecedented opportunity to generate biological insight
Abstract The rapid development of high-throughput sequencing techniques provides an unprecedented opportunity to generate biological insights into microbiome-related diseases. However, the relationships among microbes, metabolites and human microenvironment are extremely complex, making data analysis challenging. Here, we present NMFGOT, which is a versatile toolkit for the integrative analysis of microbiome and metabolome data from the same samples. NMFGOT is an unsupervised learning framework based on nonnegative matrix factorization with graph regularized optimal transport, where it utilizes the optimal transport plan to measure the probability distance between microbiome samples, which better dealt with the nonlinear high-order interactions among microbial taxa and metabolites. Moreover, it also includes a spatial regularization term to preserve the spatial consistency of samples in the embedding space across different data modalities. We implemented NMFGOT in several multi-omics microbiome datasets from multiple cohorts. The experimental results showed that NMFGOT consistently performed well compared with several recently published multi-omics integrating methods. Moreover, NMFGOT also facilitates downstream biological analysis, including pathway enrichment analysis and disease-specific metabolite-microbe association analysis. Using NMFGOT, we identified the significantly and stable metabolite-microbe associations in GC and ESRD diseases, which improves our understanding for the mechanisms of human complex diseases.