For infectious disease, long-read sequencing provides in-depth information

June 5, 2018

With advances in infectious disease science, clinical laboratory teams have discovered that there are many situations for which tried-and-true microbial tests are insufficient. Whether it’s accurate strain identification or tracking the spread of an infectious disease in a hospital, more and more applications require microbial whole genome sequencing to generate the most reliable and clinically actionable information possible.

Historically, many clinical labs have used short-read sequencing technology to produce whole genome assemblies. Unfortunately, even for these smaller genomes, the short reads produced cannot span repetitive regions, structural variants, and other important elements. Bioinformatics tools designed to stitch the data together for an assembly are often flummoxed by reads that cannot be unambiguously ordered or aligned.

Experts are now embracing long-read sequencing technologies that span even the largest genomic elements to produce gapless, complete microbial assemblies with excellent accuracy. Such tools also capture elements in the accessory genome, such as plasmids, which are often essential to understanding clinically relevant traits such as virulence and antibiotic resistance. This information is being deployed in clinical settings to improve diagnosis, inform treatment selection, and trace transmission paths of outbreaks.

Accurate assemblies

Infections from even closely related microbial strains can require different treatment. In these cases, providing more comprehensive information than conventional microbial assays generate can be critical to ensure optimal treatment for a patient.

For example, a recent analysis of two strains of Mycobacterium tuberculosis found that previous analyses of these microbes had significantly overestimated the number of genetic variants between the strains.1 That led to the inaccurate categorization of many so-called variants as pathogenic when in fact they represented sequencing errors, primarily due to GC bias and repetitive DNA. By identifying those variants as mistakes through a type of long-read sequencing known as single molecule, real-time (SMRT) sequencing, the scientists made it possible for others interested in these strains to focus on the real variants most likely to have clinical significance. In their paper reporting these results, the authors urged others in the community to question draft or even reference genomes produced with short-read or Sanger technologies. “As de novo assembly can be routinely performed for microbes using single-molecule sequencing,” they wrote, “we strongly recommend this for mycobacteria.”

In addition to accurately identifying microbial strains, long-read sequencing is also important for making clinically relevant discoveries about microbial vectors. A new publication from a large collaboration of scientists revealed the genome sequence of the mosquito that harbors Zika, dengue, and other viruses.2 The new assembly closed or addressed many of the gaps in previous attempts, with a 93 percent decrease in the number of contigs. Key findings included novel insight into the mosquito’s sex-determination locus, a major target for efforts to curb the spread of these viruses by shifting populations toward harmless males.

Accessory genome

A longtime challenge in the clinical analysis of microbes is that they tend to carry genes associated with antibiotic resistance and virulence in plasmids or other elements of the accessory genome. This makes them, in effect, hidden in plain sight, since most sequencing technologies are unable to resolve those separate elements. With its long reads, though, SMRT sequencing can assemble accessory genomes and give clinical labs important insight.

A good example of this comes from scientists at the Houston Methodist Research Institute and collaborators, who analyzed strains of Klebsiella pneumoniae found throughout the population in a region of Texas.3 K. pneumoniae, a dangerous microbe often found with high levels of drug resistance, has been known as a healthcare-associated infection (HAI). This study showed that the microbe was more abundant among the general population than expected, and an in-depth analysis of plasmids revealed the emergence of a virulent, highly drug-resistant strain in the area. As part of the work, the team used whole genome data to identify classifiers of drug resistance for the majority of antibiotics used to fight K. pneumoniae.

The challenge of antibiotic resistance—including multidrug resistance—in K. pneumoniae infections is felt around the world. In Germany, scientists used long-read SMRT sequencing to generate closed genomes for 16 strains of this organism, which they intend to use not only for strain identification but also for evaluating drug-resistance status.4 The scientists called for a public genomic database so this kind of data could be available to everyone charged with battling the nasty infection.

Many K. pneumoniae strains carry various forms of carbapenemase in plasmids, giving the microbe resistance to the carbapenem class of antibiotics. The analysis of strains in Germany found that the core genome varied little—no more than 25 single nucleotide polymorphisms (SNPs) differed across strains—but identified several types of plasmids, including a novel prophage. The plasmids had a great deal of divergent sequence, making them key elements for a genomic analysis of the microbe’s likely clinical impact.

Outbreak transmission

For hospital teams striving to rein in the spread of infections among patients, one of the most important roles of the clinical laboratory is to trace the path of transmission. Did an outbreak start inside the hospital, or within the community? Is there a department where infections are more likely to spread?

The detective work needed to answer such questions can be aided by long-read sequencing, as illustrated by a major study at the NIH Clinical Center a few years ago.5 Scientists and clinicians determined that HAIs were less common than believed; more problems were caused by already colonized patients who received false negative results from basic screening.

In an analysis of 20 isolates collected from the clinic and characterized with whole-genome SMRT sequencing, scientists successfully tracked the spread of carbapenem resistant Klebsiella from an index patient to 17 others who became infected. They also demonstrated that typical approaches for trying to generate this information, including PCR, pulsed-field gel electrophoresis, and multilocus sequence typing, could not accurately distinguish among strains, which would have made it impossible for clinical teams to trace the transmission as it happened.

In reporting these results, scientists noted that the cost of whole genome sequencing should be considered justifiable for cases like this. “The cost of whole-genome sequencing is dwarfed by [other] costs associated with outbreaks and their investigations, including the human and financial toll and the loss of patient confidence in the health care facility,” they wrote.

Looking ahead

As the technology behind long-read sequencing matures, capacity has increased substantially while costs have fallen significantly. It is now feasible to use SMRT sequencing for real-time microbial surveillance, outbreak analysis, and infection strain identification in clinical labs, generating results affordably and rapidly enough to have an impact on patient care.

The more comprehensive information produced gives clinical lab teams unique insight into antibiotic resistance, virulence, transmission, and the emergence of new strains. This can enable direct clinical benefit: alerting medical teams when a quarantine is needed, informing treatment choice, and much more. Most important, it can lead to improved health outcomes for patients and lower healthcare costs, since the right treatment can be administered more quickly, reducing both the duration of hospital stays and readmission rates.

1. Elghraoui A, Modlin S, Valafar F. SMRT genome assembly corrects reference errors, resolving the genetic basis of virulence in Mycobacterium tuberculosis. BMC Genomics 2017;18:302.
2. Matthews BJ, Dudchenko O, Kingan S, et al. Improved Aedes aegypti mosquito reference genome assembly enables biological discovery and vector control. BioRxiv website.
3. Long S, Olsen R, Eagar T, et al. Population genomic analysis of 1,777 extended-spectrum beta-lactamase-producing Klebsiella pneumoniae isolates, Houston, TX: unexpected abundance of clonal group 307. mBio. 2017;vol. 8 no. 3e00489-17.
4. Zautner AE, Bunk B, Pfeifer Y, et al. Monitoring microevolution of OXA48-producing
Klebsiella pneumoniae ST147 in a hospital setting by SMRT sequencing. Journal of Antimicrobial Chemotherapy. 2017;72(10):2737-2744.
5. Conlan S, Thomas P, Deming C, et al. Single-molecule sequencing to track plasmid diversity of hospital-associated carbapenemase-producing Enterobacteriaceae. Science Translational Medicine. 2014:6(254):254ra126.