The promises of the Human Genome Project and personalized medicine based on individual genomes are predicated on technologies to read out the said individual genomes. Barriers to application have always been ones of cost and throughput, with the history of DNA sequencing technologies routinely punctuated by relatively abrupt methodological and technological advances. In turn, each of these—from Maxam and Gilbert chemical sequencing, to isotopically labeled Sanger sequencing in slab gels, to fluorescently labeled Sanger sequencing in capillary electrophoresis instruments, to massively parallel short read sequencing by synthesis approaches read out via pyrosequencing, ionic pulses, or fluorescent methods—has appeared, gone through rounds of optimization, and yielded more information more easily, faster, and cheaper per base pair than prior methods. (Note that these metrics are per base pair, not necessarily per reaction, which is why in today’s market of sub-$1,000 full human exome NGS services there’s still an active role for older methods such as capillary sequencing when you just want to know the sequence of a particular small region.)
None of these new methods has appeared complete and perfect overnight; each had to make the arduous trek from research lab bench through engineering, chemistry, and bioinformatics processing development and improvements to become reliable, mature methods suitable for clinical application. Similarly, what will probably be the next significant method change in sequencing technology has been working its way through this path for a few years now. So, it’s an opportune time to take a closer look at what that new method is, and what benefits it might bring to clinical sequencing. Nanopores are the core of this approach, but before we get into a look at the state of a few competing applications of this technology to sequencing, let’s pause to consider what some of the drawbacks are to the most widely used NGS methods today.
Drawbacks to NGS
NGS instruments are expensive, requiring significant capital investment. Regardless of platform, they’re pretty complex machines with high manufacturing costs that have to be recouped through a mix of base instrument purchase price and cost of any associated consumables.
NGS short-read-by-synthesis platforms require significant upstream handling of each sample for library preparation: the genomic DNA must be sheared to billions of short pieces of a narrow-size window suited for analysis, and tagged with various linkers, adapters, and barcodes to allow each to be tracked and read through the process. Library preparation is relatively slow and painstaking, despite ongoing improvements in kits, reagents, and automation of various workflow steps.
NGS instruments are, by and large, well… large! They’re suited for use in a core lab environment, with samples (and subsequent libraries) coming to them. That is, of course, the successful common model for many lab instruments, but if you could imagine some application where you wish you could have NGS technology with you in your backpack and deploy it in some remote or resource-limited setting—that’s not going to happen with any of the common NGS platforms.
NGS workflows require heavy bioinformatics servers just to handle the tiling steps. Tiling, as long-time readers of this series may recall, is where you take all of your short random fragment DNA (or RNA derived cDNA) reads, and look for ones that overlap. By placing these overlaps in sequence, longer and longer “contigs” (contiguous reads) can be assembled to place each short bit in its critical larger genomic (or transcriptional) context. It’s a computationally intensive task though, and one which can have challenges in reading through the longer stretches of repetitive DNA which are dotted about the average
eukaryotic genome.
Nanopore sequencing to the rescue?
The family of methods we examine today is known as nanopore sequencing, and it has the promise of avoiding each of the downsides we just listed above. As the name suggests, nanopore sequencing is based on having a barrier of some sort with very tiny holes—the nanopores. Essentially the concept is that one dissolves DNA from a test sample into an electrically conductive buffer, with the individual molecules being very long. “Long” here is something of a relative term; we’re talking on the order of a million base pairs, which is pretty long compared to the 150-300 base pairs per readable fragment of most by-synthesis NGS systems, but pretty short compared to an intact human chromosome; for example Chromosome 11, near the middle of the cytogenetic pack, is ~135 million base pairs long. Just routine handling such as pipetting of an average DNA extract induces random chromosome fragmentation though, which conveniently means that a sample-like human cellular DNA is pretty much in the right size range by the time it’s collected and purified. This conductive solution with DNA is applied to one side of our barrier with pores, with conductive solution (without DNA) on the other. An electric current is applied with positive polarity to the no DNA side, and the DNA (being intrinsically negatively charged, due to the phosphate ions in the backbone) will try to migrate through the only available channels—the nanopores.
What’s critical here is that the pore sizes be very uniform, and small enough so that only one strand of the DNA can fit in and start spooling through at a time. Second, the pore must have some observable and differentiable changes (usually electrical in nature) which occur as a given nucleotide slides through. Such pores do exist, both in the form of naturally occurring or slightly modified protein porins (such as human α hemolysin or Mycobacterium smegmatis porin A), or various controlled synthesis solid state materials including metals, metal alloys, and carbon nanotubes. Each of these pore types has its own strengths and weaknesses, such as uniformity of size, ease of manufacture, accuracy of base discrimination, and working lifespan; a significant part of nanopore sequencing becoming mainstream will likely be a convergence on one or at most a handful of pore types and pore cell production methods that effectively balance these issues.
In any event, we should now be picturing two wells of conductive buffer, a nanopore-perforated membrane in between, our bulk DNA sample on one side, and single DNA strands starting to poke through the pores. Under the influence of the electric field, these single strands extrude out into the positive buffer well at quite amazing speeds; it has been estimated that each individual base spends only a few microseconds within the sensing region of a pore, meaning our 1 megabase fragment is totally through (and hopefully, read out) in a matter of a few seconds.
Once a pore is open, another waiting DNA molecule engages and starts translocating and being read. Assuming for the sake of argument that we’re able to accurately read and record the whole fragment at one pass like that, a single pore would be able to sequence our Chromosome 11 example in its entirety in something like nine minutes. If your barrier had 135 pores and each different Chromosome 11 fragment magically went to a different pore, you’d get the entire sequence of someone’s Chromosome 11 in something like five seconds.
In current reality, challenges to this occur due to most pore systems not being able to accurately read every base as it spools through. Strategies to address this include modifying the pores to slow and restrict strand translocation, and/or requiring multiple reads with consensus building to yield final accepted “output sequence.” While these approaches slow the process down, we’re also able to put many more pores (hundreds to even low thousands per barrier) as a compensation, so speeds achievable with current nanopore systems are still very fast; more on that below.
Where we’re at today
So where are we now with this technology, and how does it address the NGS issues raised above? Several companies are active in this space and have either prototypes or available-for-purchase devices, so let’s look at how these stack up at present.
Size, portability, and cost. The smallest and least expensive commercially available nanopore sequencer at present is just a bit larger than your average USB thumb drive; in fact, it even looks like a thumb drive, as it’s got a USB plug sticking out to port its data back to your data collection computer. Costing around $1,000, it’s essentially a disposable single-use device capable of providing several gigabases of read on a sample over 24 to 48 hours. While that’s not enough to sequence a human genome, it’s more than adequate for things like in-depth environmental microbial sampling or even capturing smaller whole genomes of some organisms. While that is the extreme in portability and can be used in field settings, larger core facility benchtop versions of the same platform have much higher throughput capacities into the terabase range.
Sample prep. No complex library preparation is required. Basically, purified DNA or RNA sample of interest is mixed with conductive buffer at an appropriate concentration and applied to the device. Note “RNA” there: that’s right, no intermediate conversion to cDNA is required (although usual caveats about RNA instability apply). Sequence data collects “in real time” on the attached computer.
Read length. We’ve used a 1 million base length in our examples above; in reality, usual read lengths range from a few hundred kilobase to the current published record of 2.2 megabases. With 1 Mb being a reasonable midpoint to these values, it’s easy to see where this approach can both read through many repetitive regions at a single pass, and where much less bioinformatics work is needed in tiling than on short-read systems.
Nanopores in the future
So with all of these great features, why are sequence-by-synthesis NGS instruments still the standard of most core labs? There are a number of factors here, with probably no single answer. A major point, however, is that nanopore systems at present don’t equal NGS instruments’ level of accuracy per read. In a clinical setting, accuracy remains a paramount concern. Sheer read depth (number of repeat reads) can be used to increase confidence levels in nanopore-based data, but this comes at the expense of longer run times and higher costs. The high capital costs of mainstream NGS instruments can be amortized out over truly massive data collections, making them cost-effective on a per-base metric. Since most molecular methods are already restricted to being done in clean, well-equipped core facilities, we’re used to adapting experimental workflows to remote sample collection and in-facility analysis. This means that for many purposes, lack of portability in an NGS sequencer is not a significant detriment.
Nanopore-based methods continue to show signs of accuracy improvement, however, and within the last few months they’ve been demonstrated to be capable of generating entire human genome sequences1 at acceptable accuracy levels, albeit with some contribution from traditional short-read methods. In fact, in the near future, the best approach for NGS may well be a hybrid combination of short-read high-accuracy methods, with nanopore long reads constructing scaffolding on which to place these shorter reads and by which to work through long repetitive stretches. And if nanopore-based methods can be further refined to improve accuracy and increase pore system lifespans, they will probably start to play an increasingly primary role in human sequencing applications with clinical significance. If we’re to reach a point where part of every routine medical visit is a “full genome workup,” done in a few hours, we’ll have to make a quantum jump in technology from sequence by synthesis methods. For now, nanopore approaches look like the most
promising way to make the leap.
Reference
- Jain M, Koren S, Miga KH, et. al. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nature Biotechnology 2018;36(4):338-345.