rRNA sequencing for bacterial identification

Jan. 21, 2016

One branch of laboratory medicine that has remained little-changed for nearly a century is the identification of bacterial species from clinical specimens. The traditional methods are based on dilution, plating on semi-solid media (broadly supportive or selective), and incubation in the appropriate environment—such as aerobic or anaerobic—leading to the growth of distinct individual colonies representative of the bacterial species present in the original sample. Identification then proceeds on these clonal microbial colonies by a combination of morphological classification following microscopic examination (by properties such as shape, motility, and cell wall structure as indirectly determined through Gram staining) and biochemical characterization (by properties such as ability to metabolize various carbon sources).

During the early days of microbiology, when many bacterial species were first identified, these methods were not only cutting-edge; they were the only methods available. Consequently, many microbes are defined to this day by these simple techniques, and thus these techniques remain in important use as the “gold standard” method(s) for classification of many bacteria. While not very fast, these methods are relatively cheap, don’t require complex laboratory infrastructure, and even come pre-packaged in simple manual or automated biochemical test panel formats readily useable even by the non-specialist (as the author can attest from personal experience).

Despite their past primacy and assured future relevance, these methods are not, however, perfect, or ideal in all cases. Morphologic and biochemical variability can mean in some instances that bacterial species which are nearly indistinguishable by these traditional methods can have very different clinical implications when detected in patient specimens. It has been in this context that molecular diagnostic (MDx) methods first started entering clinical use more than a decade ago, and they continue to do so today.

Cell biology: a quick mini-review

To understand this application, a refresher on some basic biology is required. Recall that the critical “machinery” of the cell for the assembly of polypeptides (proteins) from single amino acids, as directed by messenger RNA (mRNA), which in turn is derived from the organism genome (DNA), is the ribosome. This large complex consists of an organized arrangement of a number of protein and structural (that is, non-coding and biologically active through their physical shape) RNA molecules. These specific RNA molecules are called ribosomal RNAs (rRNAs) and are highly conserved across species. The individual rRNA types are historically classified on their hydrodynamic separation properties in Svedberg units “S,” with prokaryotic ribosomes having 23S, 16S, and 5S rRNA components and eukaryotic ribosomes having analogous 28S, 18S, 5.8S, and 5S rRNAs.

As a cell requires many individual ribosomes to function, large numbers of these rRNA molecules are needed, and to provide this most organisms contain multiple identical copies of the DNA sequences coding for them (known as rDNA) within their genomes; for example, the E. coli genome contains seven copies of its rDNA genes. While the hallmark of MDx methods is sensitivity, having a multicopy genetic target like rDNA sequences just improves lower bound sensitivity, and the polymerase chain reaction (PCR)-based amplification of rDNA is particularly easy—perhaps even too easy in some cases, as we’ll see below.

In the bacterial (prokaryotic) side, sequence analysis across many species revealed that the 16S rRNA is both very highly conserved overall (that is, the 1542 nucleotide sequence is identical regardless of source in almost all nucleotide positions) and contains particular regions which, while well conserved within a single bacterial species, have a number of small but consistent changes between species. This combination is a molecular diagnostician’s dream come true, for it provides the opportunity to design almost universally conserved PCR primers flanking a relatively short (few hundred base pair) region which contains a few consistent variations among different bacterial species. The size of the amplicon is readily amenable to Sanger sequencing methods.

Enter molecular diagnostics

As this became automated through the development of capillary sequencing instruments, a powerful MDx method for the identification of isolated bacterial samples (colonies or broth monocultures) came into being. The method consists of extracting DNA (even by very crude, rapid methods as we’re looking for a high copy number target); PCR amplification with “universally conserved” primers against a part of the 16S rDNA sequence; sequencing of the amplicon; and comparison of this sequence for an identical (or closest) match against libraries of 16S rDNA sequences from known bacterial organisms. While proprietary, curated libraries of 16S rDNA sequences were (and are) probably the most accurate libraries to compare against, publicly accessible libraries such as GenBank queried by open access tools (usually a variant of the old standby “Basic Local Alignment Search Tool” [BLAST]) have been demonstrated to be clinically viable in this approach. Lest we fear eukaryotes are left out of this approach, it should be noted that both this method and the more recent advances described below in 16S prokaryotic context can be applied to eukaryotic 18S rDNA sequences for the identification of fungal species.

As described, this approach suffers from a significant failing which it shares in common with the traditional morphological/biochemical methods. That is the requirement for isolated single colonies of bacterial species to start from. Mixed samples are not amenable to analysis prior to separation through dilution and one or more rounds of plating. Bacterial species which do not grow under the conditions used (and even with a wide range of media and growth conditions, many bacteria remain challenging or impossible to grow in the customary semi-solid plate agar format) are therefore not amenable to identification either through traditional methods or this MDx technique.

A second and partially overlapping problem sometimes encountered with this method arises from the fact that PCR employs by nature DNA polymerases; these are generally derived by purification from bacterial expression systems; DNA polymerases exhibit sequence non-specific DNA binding affinity; rDNA sequences are often multiply repeated in bacterial genomes, including those bacteria used for DNA polymerase production; and PCR is highly sensitive. Add those up, and there’s a constant risk that the method may amplify contaminating trace rDNA from the polymerase production strain. While this was a distinct nuisance in early iterations of the method, an understanding of the source of these spurious sequences led to the availability of specialized “low DNA content” polymerases, and to better bioinformatics in the detecting and rejection of results which most likely arise from contamination. As is always the case with lab testing, understanding and rational consideration of the test results in clinical context is required!

Resolving the problems

Both of these problems—a need for isolated pure cultures, and a need for the capacity to discriminate results arising from bacteria associated with case-associated pathology as opposed to clinically irrelevant co-detected bacteria—are resolved by more modern iterations of this bacterial identification strategy. Specifically, the application of next-generation sequencing (NGS) methods, with the capacity to sequence very many product amplicons simultaneously in parallel, provides solutions to both while allowing for the extraction of even more useful data.

In its NGS guise, this method can work starting with a direct patient sample containing very small numbers of a significant pathogen even in the presence of other non-significant bacteria. The approach proceeds similarly as above, with bulk amplification of the sample by “universally conserved” 16S primers and generation of a mixed pool of amplicons arising from their respective bacterial sources. These individual amplicons are then separated and independently sequenced in parallel by the NGS process. Regardless of the exact NGS technology used, a “behind the scenes” tiling, sequence assembly, and bioinformatics strategy takes place to eventually output both full representative sequences of each bacterial 16S type present in the sample, and the count or relative frequency of each sequence compared to the total number of sequences determined.

The identities of each bacterial species contributing a 16S type to the milieu are identified by the same library comparison approaches as used for the single sequence, Sanger approach. The amplicon frequency values, while not linearly representative of the true frequency of the occurrence of each underlying bacterial species type in the sample (due to issues such as different numbers of rDNA copies per genome of different bacterial species, or sequence-based differences in PCR amplification kinetics), can be corrected back by bioinformatic processes to yield meaningful numbers on the relative frequencies of each bacterial species identified in the sample. This in turn can help to differentiate low-level contaminant signals (such as rDNA from polymerase) from meaningful signals, and allows for the detection of known pathogenic species even when mixed with apathogenic or commensal organisms. Complex, multipathogen communities and their evolution over time in contexts such as cystic fibrosis sputum samples can be directly observed by this technique with a level of detail not previously available in traditional culture and enumeration approaches.

Each of these variations on the method—Sanger single-product sequencing and NGS methodology—has utility in the modern molecular pathology laboratory. The Sanger-based method is economical both in terms of instrumentation and bioinformatics capacity, and can be helpful for the unequivocal differentiation of an isolated bacterial sample from (biochemically and morphologically) close relatives, where such differentiation is needed. The NGS methods, while more powerful and producing more data, require both more expensive instrumentation and bioinformatics capacity, as well as significantly higher associated reagent costs and times required in generation of a NGS library.

In the clinical lab

Either of these approaches may thus be of use to today’s clinical laboratory scientist. Neither, however, fully replaces traditional agar plate methods for a full sample workup, primarily because they currently lack an effective way to evaluate absence or presence (and magnitude) of specific antibiotic-resistance profiles with the bacterial species detected as present. Antibiotic resistance is defined in a phenotypic method, such that detection even of a known antibiotic resistance determinant gene in the context of an isolated pure organism is not absolutely definitive of clinical antibiotic resistance (note, however, it can be very strongly suggestive, and would in most cases be a powerful assistant in establishing initial empiric therapy choices).

In a multiorganism NGS-based context, however, the detection of antibiotic resistance markers is much less meaningful, as they generally cannot be unequivocally assigned as having come from a particular bacterial species in the mix, and thus have little capacity to inform therapy choices. Approaches to address this shortcoming through various bioinformatic techniques are under active development; however, until they are validated and reach mainstream use, and the cost and time to result of NGS methods both decrease, we won’t see these as a substitute for the agar plates we know so well. For now, bacterial 16S identification methods remain useful adjuncts to classical microbiology rather than full replacements.

John Brunstein, PhD, is a member of the MLO Editorial Advisory Board. He serves as President and Chief Science Officer for British Columbia-based PathoID, Inc., which provides consulting for development and validation of molecular assays.