Back to Basics: Next-generation sequencing methods and applications

Aug. 22, 2017

This month’s installment of The Primer continues our refresher on basic techniques and methods by looking at the high throughput, massively parallel nucleic acid sequencing approaches collectively known as “next-generation sequencing” or NGS. Since these methods were first touched on in this column, they have steadily come down in price, increased in throughput, and found wider applications in clinical service.

Recall from the July installment [MLO. 49(7):58-60] of The Primer that Sanger sequencing is a directed, single-target approach to determining a DNA sequence through template-directed synthesis. This robust and widely available method is very effective (and with decades of refinement on equipment and protocols, simple) for determining sequence of short defined areas of interest. From a clinical perspective, this is useful in a recognized presentation where information on one or perhaps a small handful of genetic regions is informative. In other situations, however, we may not have the luxury of knowing what genetic regions are of relevance, and informed treatment choices require sampling a large number of genetic loci at once.

It would be physically possible to do this by having very large numbers of Sanger sequencers all working in parallel—in fact, that’s exactly how the first draft of the human genome was obtained—but it’s wildly impractical from a cost, labor, and time perspective. Next-generation sequencing, so named as the class of methods which superseded Sanger sequencing for large projects, tackles this problem at its root. While there are several quite different underlying technologies available for NGS, each with its own particular strengths and weaknesses, they all work through automating the running of thousands to millions of very tiny individual reactions in parallel. Each micro-reaction determines the sequence of one (usually, small) single region, and then the challenge becomes one of a computational and bioinformatics nature.

Two NGS approaches

Some NGS methods (such as those used by Illumina and Thermo-Fisher instruments) work in a way that is akin to Sanger sequencing, that is, through using the target DNA as a template and using polymerases to synthesize de novo complementary strands. While quite different physical technologies are employed, each has a method for detecting the identity (A, G, C, or T) of each sequentially added base in each individual reaction. Other NGS methods (such as those used on Oxford Nanopore instruments) take a completely different approach, and essentially “spool” a strand of DNA through a tiny deformable hole or pore; each of the four possible bases deforms or interacts with the pore in a detectable and electrically distinguishable way as it passes through.

Regardless of which of these underlying approaches generates the data, it’s then up to software to identify separate reactions which sequence overlapping regions and piece these together into longer contiguous reads, in a process known as tiling. The fact that multiple reactions have all read over the same region—part of the concept known as “read depth”—allows for correction of any single-point method errors by output of the most common “consensus sequence.” (This is not the only meaningful way to handle multiple reads which report different point variations in a locus; we’ll discuss alternate and sometimes highly useful approaches, below).

Querying predefined genetic regions

Let’s now consider some of the applications for NGS in the clinical setting. Our first example is where only a small number (perhaps 20 to 100) of predefined genetic regions of interest are desirable to query in a patient sample. At first glance this might appear more suited for Sanger then NGS; but if these same regions are of interest in multiple samples with enough throughput, then we can apply the massively parallel nature of the approach with a twist. When the individual samples are prepared for NGS (a process known as “library preparation”), we can attach what are effectively sample-specific identifiers to each template fragment. That is, all DNA fragments—and their resulting sequence output—from patient AA entering the machine are distinguishable from those fragments from patient ZZ. Now the costs of a single NGS instrument run are amortized over multiple samples, and the cost per base of information can become very reasonable.

Applications of this sort are most frequently encountered in forms such as “cancer diagnostic panels,” and build on the knowledge that mutations in a relatively small number of known oncogenes and tumor suppressor genes can occur in a wide range of cancer types. Case-specific knowledge of which mutations are present can be critically informative in selecting effective treatment—a good example of personalized medicine. Other NGS panel types exist and should be expected to become more common, as shared groups of specific genetic targets across larger patient populations are identified. Think, for instance, of a pharmacogenomics panel, which could query alleles of metabolic enzymes responsible for steps in the metabolism of many drugs; by thus providing information on kinetics of prodrug activation or drug clearance, more accurate personalized initial dosing schedules could be prescribed.

Detecting structural variations

Detection of structural variations such as insertions, deletions, inversions, loss of heterozygosity (LOH), and translocations is another application where NGS methods are becoming more widely applied. Other molecular methods of the cytogenetics lab, including FISH, array comparative genomic hybridization (aCGH), and SNP arrays, remain popular and possibly even better for some of these applications at present. Multiple publications in the past few years have looked at the application of NGS to this space and shown it to be feasible, albeit at present with challenges on the bioinformatic processing side more than on the technical data collection side. As NGS hardware becomes increasingly affordable and thus more common in core MDx labs, improvements in the bioinformatics tools available for these uses should be expected. As with other instruments in the laboratory, cost (and bench space) pressures encourage the adoption of multiuse platforms such as NGS, as opposed to multiple single-use systems.

Detecting low-abundance targets

Another good application of NGS is in the detection of low-abundance genetic targets in a background of other DNA. This method is one case of the previously mentioned counter-example of what to do with multiple NGS reads for one region, which show differences; in this case, instead of the software assuming such differences to be method errors and “hiding” them under a consensus result, the differences are taken as being real and are a marker for the presence and relative abundance of the different sequence targets. Examples include the detection of fetal DNA sequences in maternal blood, the measurement of minimal residual disease in leukemias (that is, the small fraction of cancer cells persisting in normal blood during treatment and remission), and the identification of previously unknown associations of a pathogen to a pathology. Similar uses relate to the examination of complex polymicrobial diversity in the gut or airway; NGS provides a direct method to capture snapshots, as it were, of both the identity and relative number of organisms present in a sample, to a depth of information not feasible in classical microbiology methods.

RNA samples

Although we have been referring to “DNA sequencing” through most of the preceding, and the methods themselves work directly on DNA, it is possible to turn the lens of NGS onto RNA samples as well. The way to do this is to make use of reverse transcriptase enzymes to make DNA complementary copies of sample RNA, and then proceed with a more familiar and easier handled substrate. Utilizing this technical twist, another common NGS application (although one currently more of a research application than a front-line clinical tool) is to examine the transcriptome of a sample—that is, the identity and relative abundance of mRNA transcripts present. Sometimes referred to as “exome sequencing,” this approach is efficient in that it applies resources only to that small portion of the genome which is functionally expressed. Of course, not all significant genetic aberrations occur within coding regions; but by observing levels (or even presence/absence) of transcripts in comparison to reference “normal” conditions, important mutations in non-coding regions such as gene promoters or splice site regulators can be inferred. When such findings are plausibly related to a disease condition, more directed studies to confirm the root cause can then be undertaken as or if needed.

A key to the ongoing increased application of NGS methods in the clinical lab will be the wider availability of curated reference data against which to compare results—to tell what’s a meaningful variation and what’s not. The complex interplays between genes and pathways mean that this is not something easily addressed by considering single loci in isolation; for instance, a known mutation at A may indeed be deleterious in the case of a “normal” or “wild-type” B locus; but there may exist compensatory B locus mutations such that normal phenotypes occur when both mutations are present. As NGS data on greater numbers of people is obtained and collated at an exponential rate, analysis of results from a single case should become easier and easier.

The Holy Grail, as it were, of NGS will be reached on the day when technology, both on the data collection side and the bioinformatics side, is advanced enough in speed, utility, and cost that a full exome sequence (or perhaps even full genome sequence) is simply a normal part of every patient examination. That this is technically possible, and could through application on the appropriate specimen types replace multiple other distinct laboratory tests, is not speculative; however, until turnaround times, costs, and well powered bioinformatic filters to make sense of the data are all in hand, this will remain a tantalizing goal. Achievement of that goal may come sooner than expected, though, and when it does, the practice of laboratory medicine will be dramatically transformed.

John Brunstein, PhD, is a member of the MLO Editorial Advisory Board. He serves as President and Chief Science Officer for British Columbia-based PathoID, Inc., which provides consulting for development and validation of molecular assays.