Target enrichment strategies for next generation sequencing

June 1, 2012

The elucidation of the human genome provided the potential to determine the genetic component of virtually any disease. However, many diseases may be linked to multiple genes, many mutations may be heterozygous, and the mutations may be present in only a small percentage of the cells in isolated tissue. Using Sanger sequencing to analyze these mutations, which provides a linear readout of one gene at a time, is very time-consuming for multiple genes, and the results can be nearly impossible to interpret.

NGS enables clinical research

Next generation sequencing (NGS) technologies developed in the last ten years obviate the limitations of Sanger sequencing by providing highly parallel sequencing and therefore a separate sequence result for every sequence of interest. NGS thus provides the potential for discriminating homozygous and heterozygous mutations, as well as detecting a small number of mutated cells in a tissue sample. This has positioned NGS as the method of choice for targeted re-sequencing of regions of the human genome identified by linkage analyses and genome-wide association studies.

Traditional target enrichment technologies

The success of the application of NGS to clinical research depends heavily on the ability to enrich for all those sequence regions of interest, enabling efficient identification of all mutations across a large number of genes. Two main technologies have been used to enable target enrichment: PCR and hybridization.

PCR (polymerase chain reaction) has been essential to the rapid progress of the study of molecular origins of disease and the success of NGS, providing the ability to greatly amplify DNA sequences that come from minute samples or that are present at a low frequency in the sample. This well-established technology offers the advantages of providing highly specific amplification of the desired sequence regions very quickly and cost effectively.

While many PCR approaches are commercially available for target enrichment with NGS, they all have disadvantages that limit their application to NGS studies of large numbers of mutations across a large number of genes. The number of parallel PCR reactions, and thus the number of target sequence regions that can be analyzed simultaneously, is severely limited due to the possibility of primer cross reactivity, dimer formation, and non-specific priming. Some priming reactions may fail, leading to dropouts, and changing the target regions can be difficult due to the long and tedious process of re-optimization of the multiplex PCR reaction.

The second major technology used for target enrichment with NGS is hybridization. Very specific oligonucleotide probes can be designed to pull out a large number of target regions of interest, across a large number of genes. An efficient embodiment of the hybridization approach to enrichment is the Solution Hybrid Selection (SHS) approach developed by scientists at the Broad Institute.1

This approach utilizes very long (>120 bases) oligonucleotides complementary to all of the exons in the human genome to generate biotinylated RNA “baits” that are used to pull out the exon sequences from a sheared genomic DNA library that already contains sequencing adaptors. These enriched targets are then ready for NGS.

The advantages of this hybridization approach are that it can be used to capture as much as 100 megabases (Mb) of DNA sequence and that it is highly automatable. It has one serious drawback, in that it requires the construction of a DNA library, involving shearing, end repair, polyA tailing, and ligation, which can be very time-consuming.

A new enrichment technology

A third technology for target enrichment has recently emerged that combines the advantages of both hybridization and PCR, without requiring library construction.2 This approach utilizes a combination of multiple restriction enzyme digestions and hybridization to a double-stranded oligonucleotide cassette that “tethers” together two sequence regions complementary to both ends of one of the digested fragments from the target region (Figure 1).

Figure 1. Workflow schematic of the new enrichment technology Step 1: Digest and denature DNA sample containing targets of interest. Step 2: Hybridize oligonucleotide probe library. Step 3: Purify target fragments with streptavidin and ligate closed circles. Step 4: Amplify targeted fragments with PCR, using barcoded primers.

The result is a set of as many as eight circularized hybrids for each target region. Each probe is also biotinylated, so that the hybrids can be separated from non-target sequences by binding to magnetic streptavidin beads. Target-probe complexes are closed by ligation to ensure that only perfectly hybridized fragments are circularized.

Using a PCR primer that is complementary to all of the hybridized fragments from one target, only circularized DNA targets are amplified. A barcode sequence to track the source of the amplified fragments to one target is also added during the process. Since a very large number of primers can be used in parallel, the result is a highly amplified collection of a very large number of specific target sequences in one tube, with one PCR reaction.

This emerging technology provides the advantage of massively parallel PCR (up to 244,000 separate amplicons in one tube) due to the high specificity and low cross-reactivity of the primers, because the two hybridization probes are tethered together. Thus, tens of thousands of target regions can be analyzed simultaneously, using as little as 50 ng of input DNA, with very high coverage, due to the restriction enzyme digestion approach. The process is very fast, being completed in six hours, allowing same-day sequencing of 96 samples at once.

While offering massively parallel yet specific PCR, the technology also complements SHS, which can screen vast regions of sequence to identify targets that can then be looked at in more detail.

Case studies

The new enrichment technology has been used in several large clinical research studies to link disease to genetic mutations. For example, researchers at the Ontario Institute for Cancer Research and the University of Toronto have used it to characterize individual tumors using formalin-fixed paraffin embedded (FFPE) samples.3 They obtained nearly 100% coverage that was uniform across 321 genes, and the result was nearly identical to those obtained from fresh-frozen samples.

A genome-wide association study (GWAS) of hypertension at Ullevaal University Hospital in Oslo, Norway, analyzed four genome regions encompassing 19 known oncogenes (~61 kb), many of which contain actionable mutations that have known significance in treatment susceptibility.4 A total of 97.4% of the sequencing reads were on amplicons (comprehensive coverage), 96.6% of the reads were on-target (high specificity), and >90% of the target regions were sequenced (high coverage). All of this was accomplished even though 38% of the sequence was composed of repeat sequences.

Many hybridization and PCR approaches are commercially available for target enrichment for NGS. This new technology offers advantages of both.

References

  1. Fisher S, Barry A, Abreu J, et al. A scalable, fully automated process for construction of sequence-ready human exome targeted capture libraries. Genome Biol. 2011;12(1):R1. [Epub 2011 Jan 4.]
  2. Haloplex technology: a genome capture system for next generation sequencing, University of Cambridge Website. http://talks.cam.ac.uk/talk/index/31305. Accessed April 18, 2012.
  3. Brown AMK, Ng K, Denroche RE, Johns J, Timms L, Ericsson O, Isaksson M, Dahl F, McPherson JD. Targeted sequencing in tumor samples using HaloPlex PCR. Poster presented at the Advanced Genome Biology and Technology Meeting, February 15-18, 2012.
  4. Personal communication.

Dr. Emily Leproust joined the Genomics division of Agilent Technologies in 2000 and has held several technical and management positions in R&D and Manufacturing focusing on the development and deployment of chemical processes for the synthesis of Microarrays and Oligo Libraries. Most recently, Dr. Leproust has been directing the Applications and Chemistry R&D team developing quantitative and structural Genomic applications powered by Microarray and Next Generation Sequencing technologies.