Molecular analysis of the individual microbiome

July 24, 2018

When we hear of next generation sequencing (NGS) methods in the context of clinical applications, most of us probably first think of it as applied to an individual’s nuclear genome—his or her innate chromosomal DNA, carrying information relating to inherited diseases, drug metabolism, and other aspects of “personalized medicine.” However, NGS can also be applied to the analysis of microbiomes (that is, the identities and relative numbers of different microorganisms) specific to both the individual and the sampling site. As we’ll review in this month’s installment of “The Primer,” there are a number of valuable clinical applications.

NGS in action

Here’s a quick reminder of how NGS systems work as applied in this use. A sample is diluted into microscopic reactions in such a way that each tiny reaction has a single template molecule. There are a number of other steps too, collectively known as “library generation,”which include fragmenting the nucleic acids to be analyzed into short pieces suitable for analysis by the chosen NGS chemistry and platform and ligating on various “barcodes” and adapters required for the process, but for today’s purpose that’s all pretty much behind the scenes. So too is the tiling process (where all of the individual short reads from each microscopic reaction are examined, areas of obvious overlap identified, and longer contiguous reads assembled out of these).

If we’re interested in the microbiome, there’s also going to be a first pass bioinformatics step which identifies and removes all of the human-derived sequences in our results, and then a second pass bioinformatics step which identifies the remaining microbial species. Because the NGS approach works by dilution to single templates, if we’ve done this on DNA, we’re also able to determine at least relative abundance of the represented species; assuming this sampling is a purely stochastic process, if DNA fragments attributable to Organism A occur 10 times as often as fragments originating from Organism B, we can assume that there’s 10 times as much Organism A as Organism B genetic material in the sample.

To convert this to something more meaningful, we’d also want to know what the relative genome sizes are; in this example, if Organism A has a genome twice as large as Organism B, then the actual relative abundance would be 5A:1B as expressed in GEq (Genome Equivalents, which is the molecular analogy to CFUs (Colony Forming Units) where one such unit equates essentially to one discrete organism. The caveat there is needed because unlike a CFU, a GEq doesn’t necessarily equate to a viable organism; for instance, if the specimen is from a patient recently treated with antibiotics, the CFU values might be significantly lower than GEq values.

Because we’re using an NGS approach here, we can at least in theory extract other information such as the presence or absence of particular antibiotic resistance markers or genes for specific toxins or virulence factors in each organism. While there are simpler, cheaper, faster molecular methods such as qPCR for direct interrogation of a sample for the presence of these kinds of markers, they’re not tremendously informative in the context of mixed microbial populations, as it may not be possible to know which organism(s) host which resistances. (It’s also worth bearing in mind that, strictly speaking, antibiotic resistance is defined on a phenotypic level at pre-established drug concentration “breakpoints”; the presence of a particular antibiotic resistance gene marker is a prerequisite for such resistance but not strictly a proof that it is expressed to a level meeting the formal requirement for being considered resistant. In many cases, however, presence of the associated markers is used to direct therapy choices, as the correlation between marker presence and functional resistance is very good.)

In the clinical setting

What are some of the clinical settings where this molecular approach to microbiome analysis is used? One obvious example is in the setting of cystic fibrosis, where classical culture methods on airway samples have been historically used to monitor disease progression and to inform therapeutic choices. This is a notoriously microbially diverse setting, including not just the “usual suspect” pathogens Pseudomonas aeruginosa, Staphylococcus aureus, and Burkholderia cepacia but also many anaerobic species. Obtaining accurate quantitative loads across this diversity with classical methods is challenging to say the least, as well as time-consuming.

By contrast, NGS methods will detect and enumerate readily cultured organisms as well as more fastidious ones, allowing for what is probably a less biased and more accurate representation of actual relative populations. The applicability of molecular methods in this specific application was immediately obvious, and indeed some of the earliest publications on NGS applications to clinical microbiology are found in the CF field. The utility of the method extends well outside of just the CF background, however, including, for example, the impact of respiratory microbiome on response to viral infections.1

Another obvious microbiome of significance is that of the GI tract. Aside from the not-so-subtle expressions of this that are familiar to travelers—particularly those with a predilection for purchasing exotic local delicacies from questionable vendors—there is a constant and very complex web of interactions between gastrointestinal microbiota and overall immunity (for example see reference 2) and even neurological function (the so called “gut-brain axis”; publications in this vein have examined impacts of gut microbiota on everything from mood to Parkinson’s disease). In fact because of this complexity, it’s probably fair to say that at present we can’t generally make much immediate clinical use of gut microbiome data—not because there isn’t a lot of valuable data there, but because we currently lack enough information and correlative observations to see the useful information amid the bulk data. This is certain to change as further research is done, and it would not surprise this author if in five years or a decade from now, we see gut microbiome workups as a common diagnostic component of a range of conditions, some of which won’t be directly digestive in overt nature.

Beyond these two very obvious microbiome sites, there are of course a range of other sampling sites and types which are likely to yield information. These include skin, upper airway (as opposed to deeper respiratory), various surface-exposed mucosal sites, and the reproductive tract. While the oft-quoted “statistic” that there are many more bacterial cells than human cells in the average person has been shown to be inaccurate, best estimates are that it’s at least in the range of 1:1, with perhaps a slight bias towards the bacteria (30 trillion human cells to 39 trillion bacterial cells in a hypothetical 170 cm tall, 70kg, 20 to 30-year-old male.3 So perhaps it’s high time we realize that we’re more ambulatory communities than unitary organisms, and begin giving more attention to the impact of our unicellular tenants.

Aside from the kinds of applications touched on above, are there other ways in which NGS microbiome sampling could be of medical use? There are, with one being the search for potential etiological agents of orphan diseases. Analyses of microbiomes in disease cases versus controls can be used to identify what appear to be significant changes in composition (species present and/or relative numbers), although they provide no information as to whether this is causal or purely correlative in nature. Combination of this short list of candidate organisms with testable mechanistic hypotheses and model systems can then be applied toward fulfillment of Koch’s Postulate in actually proving or disproving causal linkage, with potential for impacts on treatment strategies.

Another application, but one which would likely only come into use if some sort of “microbiome snapshot” becomes a more routine part of a diagnostic workup, would be in the tracking and tracing of outbreaks. Molecular methods have been applied in this context for years; for example, in an outbreak of Salmonella cases, molecular fingerprinting has been used to ensure that the cases being tracked are indeed all likely from a common source, and not just unrelated statistical anomalies. If large pools of relevant microbiome data were being collected and available for public health scrutiny, we’d probably be in a position to have faster and more comprehensive detection of food-associated outbreaks and faster tracking of root sources.

It’s becoming more feasible

The impediments to all of these and other possible applications of NGS microbiome analysis of human samples remain ones of basic cost, throughput, time, and return on investment. Advances in methods and devices continue to bring NGS costs down, however, while simultaneous gains in computational capacity per unit cost assist on the bioinformatics side. As these costs come down and our appreciation for the value of the data increases, the ROI metric will begin to tip in favor of broader application of the method.
Has all of the above whetted your appetite for a deeper look at clinical applications of NGS for microbiomes? As further reading, you might turn to reference 4, a fairly recent review with a broad focus.


  1. Pichon M, Lina B, Josset L. Impact of the respiratory microbiome on host responses to respiratory viral infection. Vaccines. 2017;5(4):40-54.
  2. Clavel T, Gomes-Neto JC, Lagkouvardos I, Ramer-Tait AE. Deciphering interactions between the gut microbiota and the immune system via microbial cultivation and minimal microbiomes. Immunol Rev. 2017;279(1):8-22.
  3. Abbott A. Scientists bust myth that our bodies have more bacteria than human cells. Nature. 2016;10.1038.
  4. Motro Y, Moran-Gilad J. Next-generation sequencing applications in clinical bacteriology. Biomolecular Detection and Quantification. 2017;14:1-6.