Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Learning a deep language model for microbiomes: the power of large scale unlabeled microbiome data

View through CrossRef
Abstract We use open source human gut microbiome data to learn a microbial “language” model by adapting techniques from Natural Language Processing (NLP). Our microbial “language” model is trained in a self-supervised fashion (i.e., without additional external labels) to capture the interactions among different microbial taxa and the common compositional patterns in microbial communities. The learned model produces contextualized taxa representations that allow a single microbial taxon to be represented differently according to the specific microbial environment it appears in. The model further provides a sample representation by collectively interpreting different microbial taxa in the sample and their interactions as a whole. We show that, compared to baseline representations, our sample representation consistently leads to improved performance for multiple prediction tasks including predicting Irritable Bowel Disease (IBD) and diet patterns. Coupled with a simple ensemble strategy, it produces a highly robust IBD prediction model that generalizes well to microbiome data independently collected from different populations with substantial distribution shift. We visualize the contextualized taxa representations and find that they exhibit meaningful phylum-level structure, despite never exposing the model to such a signal. Finally, we apply an interpretation method to highlight microbial taxa that are particularly influential in driving our model’s predictions for IBD. Author summary Human microbiomes and their interactions with various body systems have been linked to a wide range of diseases and lifestyle variables. To understand these links, citizen science projects such as the American Gut Project (AGP) have provided large open-source datasets for microbiome investigation. In this work we leverage such open-source data and learn a “language” model for human gut microbiomes using techniques derived from natural language processing. We train the “language” model to capture the interactions among different microbial taxa and the common compositional patterns that shape gut microbiome communities. By considering the entirety of taxa within a sample and their interactions, our model produces a representation that enables contextualized interpretation of individual microbial taxa within their microbial environment. We demonstrate that our sample representation enhances prediction performance compared to baseline methods across multiple microbiome tasks including prediction of Irritable Bowel Disease (IBD) and diet patterns. Furthermore, our learned representation yields a robust IBD prediction model that generalizes well to independent data collected from different populations. To gain insight into our model’s workings, we present interpretation results that showcase its ability to learn biologically meaningful representations.
Title: Learning a deep language model for microbiomes: the power of large scale unlabeled microbiome data
Description:
Abstract We use open source human gut microbiome data to learn a microbial “language” model by adapting techniques from Natural Language Processing (NLP).
Our microbial “language” model is trained in a self-supervised fashion (i.
e.
, without additional external labels) to capture the interactions among different microbial taxa and the common compositional patterns in microbial communities.
The learned model produces contextualized taxa representations that allow a single microbial taxon to be represented differently according to the specific microbial environment it appears in.
The model further provides a sample representation by collectively interpreting different microbial taxa in the sample and their interactions as a whole.
We show that, compared to baseline representations, our sample representation consistently leads to improved performance for multiple prediction tasks including predicting Irritable Bowel Disease (IBD) and diet patterns.
Coupled with a simple ensemble strategy, it produces a highly robust IBD prediction model that generalizes well to microbiome data independently collected from different populations with substantial distribution shift.
We visualize the contextualized taxa representations and find that they exhibit meaningful phylum-level structure, despite never exposing the model to such a signal.
Finally, we apply an interpretation method to highlight microbial taxa that are particularly influential in driving our model’s predictions for IBD.
Author summary Human microbiomes and their interactions with various body systems have been linked to a wide range of diseases and lifestyle variables.
To understand these links, citizen science projects such as the American Gut Project (AGP) have provided large open-source datasets for microbiome investigation.
In this work we leverage such open-source data and learn a “language” model for human gut microbiomes using techniques derived from natural language processing.
We train the “language” model to capture the interactions among different microbial taxa and the common compositional patterns that shape gut microbiome communities.
By considering the entirety of taxa within a sample and their interactions, our model produces a representation that enables contextualized interpretation of individual microbial taxa within their microbial environment.
We demonstrate that our sample representation enhances prediction performance compared to baseline methods across multiple microbiome tasks including prediction of Irritable Bowel Disease (IBD) and diet patterns.
Furthermore, our learned representation yields a robust IBD prediction model that generalizes well to independent data collected from different populations.
To gain insight into our model’s workings, we present interpretation results that showcase its ability to learn biologically meaningful representations.

Related Results

Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
<p><em><span style="font-size: 11.0pt; font-family: 'Times New Roman',serif; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-langua...
Immune-oncology-microbiome axis may result in AKP or anti-AKP effects in intratumor microbiomes
Immune-oncology-microbiome axis may result in AKP or anti-AKP effects in intratumor microbiomes
AbstractAn emerging consensus regarding the triangle relationship between tumor, immune cells, and microbiomes is the immune-oncology-microbiome (IOM) axis, which stipulates that m...
Designing function-specific minimal microbiomes from large microbial communities
Designing function-specific minimal microbiomes from large microbial communities
AbstractMotivationMicroorganisms thrive in large communities of diverse species, exhibiting various functionalities. The mammalian gut microbiome, for instance, has the functionali...
Phylogenetic Measures of the Core Microbiome
Phylogenetic Measures of the Core Microbiome
Abstract Background A useful concept in microbial ecology is the ‘core microbiome.’ Typically, core microbiomes are defined as the microb...
Lysogeny destabilizes computationally simulated microbiomes
Lysogeny destabilizes computationally simulated microbiomes
AbstractBackgroundThe Anna Karenina Principle predicts that stability in host-associated microbiomes correlates with health in the host. Microbiomes are ecosystems, and classical e...
The Future of Microbiome Medicine – An Editor’s Perspective
The Future of Microbiome Medicine – An Editor’s Perspective
The microbiome field continues to grow at an exponential rate with sophisticated approaches that are pushing the frontiers of science and translating fast into clinical practice. T...
Quantifying the impact of Human Leukocyte Antigen on the human gut microbiome
Quantifying the impact of Human Leukocyte Antigen on the human gut microbiome
AbstractObjectiveThe gut microbiome is affected by a number of factors, including the innate and adaptive immune system. The major histocompatibility complex (MHC), or the human le...
Phycobiliprotein production with cyanobacteria-rich cultures and microbiomes
Phycobiliprotein production with cyanobacteria-rich cultures and microbiomes
(English) Phycobiliproteins are pigments found in cyanobacteria, which are exploited in the food, cosmetic, and pharmaceutical industries. However, the large-scale production of th...

Back to Top