Integrating clinical genomics into the standard of care for medical providers

June 20, 2013

Every human disease has a genetic component, and, due to the decreasing costs of genome sequencing, this diagnostic information is now finally available to healthcare practitioners en masse. The federal government has invested heavily in mapping and understanding the human genome over the past two decades through the Human Genome Project (HGP) and its subsequent large offshoot initiatives. While there continues to be a large amount of discovery work to do to understand the full contributions of the genome across all disease, prognoses, and drug responses, we currently have a deep understanding of the relationships between variants in the human genome and disease in certain clinical areas that are immensely useful.

That being said, the application of genomic information to better patient management is challenging because of the complexity of the new sequencing technologies, sheer number and differential quality of correlations between the genome and disease, and interpretative complexity of the information. To address these challenges, a number of genomic interpretation platforms have been developed for diagnosticians and providers, some of which can easily be incorporated into existing workflows and deliver accessible and actionable data to aid in clinical decision-making. Purveyors of diagnostic information can now take advantage of the decreases in cost and increases in sensitivity to maintain or grow their market share in a hyper-competitive industry.

From consortia to clinic

More than a decade ago, the draft human genome sequence was published by both a federally funded initiative and a private sector effort.1,2 Since then, we have seen additional massive investment by the public sector to determine the function of this complex blueprint for life, including the International HapMap project to map the inherited blocks of the genome across global populations, and the Encyclopedia of DNA Elements (ENCODE) project to define the “functional elements” within the genome.3,4 The ENCODE project, in particular, has shown definitively that more than 80% of the human genome is not “junk DNA”; rather, these sequences are critical for regulating normal gene function. The ramifications of this study for understanding disease at the molecular level are staggering and provide a new foundation for development of new and better clinical laboratory tests. Parallel to these large consortia efforts, scientists around the world have been developing correlations between precise positions in the human genome and disease for the past three decades. These data are being incorporated into diagnostic testing at a rapid rate. For example, the ENCODE annotations are now being used to assess pathogenicity of intragenic variants in routine testing.

The correlations between positions in the genome and disease states can be divided into five areas: 1) Mendelian (monogenic) disease; 2) chronic non-communicable disease (complex genetic disease); 3) drug metabolism and response (pharmacogenomics and companion diagnostics); 4) tumor prognosis and drug response; and 5) host-pathogen interactions largely mediated by the immune system and its variation. From a genetics perspective, the clinical area that we as a diagnostics community understand most deeply is that of Mendelian disease.

Monogenic disease and mystery diagnoses

DNA diagnostic tests have been used pervasively to diagnose single-gene Mendelian diseases, which affect 5% to 10% of all births and represent a mature multibillion dollar per year industry.5 The testing lifecycle usually begins with a family-based study that uncovers a severe DNA alteration found to segregate with the disease through the pedigree; thereafter, a set of several hundred “unaffected” chromosomes are tested to ensure the variant is never seen in unaffected individuals. Following these studies, the test can then be implemented as a Laboratory Developed Test (LDT) in a CLIA-certified laboratory. The laboratory workflow is usually based on sequencing the entire coding region of the gene as well as flanking regions using Sanger sequencing methods; then manual curation is performed on the chromatographic output of the sequencer to identify variants relative to a reference sequence from unaffected individuals. The methodology is inefficient from a workflow perspective because each gene must be assayed using unique reagents, and it is costly because of the technology employed and the manual inspection required.

Despite the thousands of single-gene tests on the market, diagnosticians estimate that clear mutations are not found in 30% to 50% of patients tested who present with a single-gene disorder. This low sensitivity can be explained in part by an incomplete understanding of all of the genes that cause monogenic disease—we understand the genetic basis of the most prevalent ~2,000 out of ~4,000 diseases—and in part by a failure to explain a patient’s variant, when identified, as causal for the disease.

In the former case, discovery efforts continue globally, and over time we will understand the primary genetic lesions that cause all monogenic disease. Because we now have the ability to sequence the entire coding region of the human genome (all ~23,000 genes) and the flanking regions of those genes, or the “exome,” in a single assay for the same price as using Sanger sequencing for only one or two genes, we now have the ability to correctly diagnose 100% of patients who have monogenic disease. This viewpoint is predicated on a tiered interpretive strategy whereby known genes causative of the phenotype are examined first; if mutation negative, the remaining genes in the genome are examined for mutations. This type of test is now available through multiple diagnostic companies and is usually offered as an “idiopathic exome test” or as an exome for “mystery disease that is clearly genetic.”

There are challenges with the new technology. These include: 1) assembling billions of short DNA sequences—the output of the machines—into a representation of the patient’s exome using complex algorithms termed “aligners”; 2) comparing the patient’s exome to an unaffected reference sequence; 3) defining which variants are disease-causing by comparing to a multitude of unaffected genomes as well as variant databases containing known mutations and polymorphisms; and 4) defining pathogenicity in variants that have never been seen before (“private mutations”). Most laboratories do not have the resident skills internal to transition from Sanger sequencing to next generation sequencing (NGS) so that they can take advantage of the workflow efficiencies, (promised) cost efficiencies, and increased sensitivity. Most diagnostic exomes on the market today have not been optimized, and have sensitivities of between 20% and 50% for monogenic disease. Continued improvements with sequencers over the coming years (primarily increasing the DNA sequence read length), coupled with rigorous clinical-grade analytics and interpretive strategies, will improve these sensitivities/testing products dramatically.

Variant of uncertain significance

A variant that has been identified in a patient but not previously known to cause the disease in question is termed a variant of uncertain significance (VUS). VUSs are typically examined using rudimentary algorithmic tools such as sorting intolerant from tolerant (SIFT) and polymorphism phenotyping (POLYPHEN), to better understand if they might be pathogenic. These algorithms primarily look for sequence conservation across evolution or other attributes that a variant might be important for gene function. The tools that can be applied to determine pathogenicity of VUSs have evolved substantially, and we now have the ability to define whether a variant is causative or not (polymorphic) with greater than 99.5% accuracy. One aspect that allows our increased ability to assign pathogenicity to VUS calls is whether the DNA variant lies in a region annotated in ENCODE as functional.

Even when a mutation for a single-gene disorder is present, diagnostics are often not perfectly predictive in some patients. This phenomenon has been termed “reduced penetrance.” Reduced penetrance is probably the result of modifier genes that are present elsewhere in the genome, and which can now be more easily identified because of the HGP and ENCODE; in turn, these modifier genes can be used to increase predictive precision in the form of modifier gene-complemented monogenic diagnostic tests. The interaction of genes and variants within the human genome is likely to continue to be allowable intellectual property moving forward and presents a significant opportunity to provide the highest accuracy test, particularly in clinical scenarios where prediction of disease is important, including reproductive counseling and decision making.

Expansion of clinical offerings

In addition to monogenic testing, the clinical areas that will benefit immediately from NGS technologies and a new wave of analytics (as the tests are transitioned during the coming 12 to 24 months) are pharmacogenomics, host-pathogen interactions, and oncology drug selection. The adoption of pharmacogenomics (also called companion diagnostics; PGx/CDx) has suffered over the past decade due to the small number of DNA-drug correlations, the paucity of actuarial data for reimbursement (cost-benefit data), and the disruption in the clinical workflow related to the need to test a gene quickly and concurrently with prescribing. Now, with the majority of pharmaceutical companies developing targeted therapies, combined with the ability to sequence genes for very low cost and archive those data for when a patient and physician need to query relevant portions of the genome, PGx/CDx is on track to become a major service line within clinical laboratories. Complementing the practical advances and compressing the adoption of PGx/CDx testing is the fact that the new healthcare legislation directs providers to pay for re-admissions within 30 days, with a major cause of re-admissions being preventable adverse drug responses.

Pathogen sequencing is transitioning rapidly to NGS technologies and analytics because of the small size of the genomes and cost advantages of the technology. Of particular importance in pathogen sequencing is the ability to “phase” the pathogen, or determine which variants that describe the different strains are on the same DNA strand, so as to correctly characterize these different strains. Phasing is more easily achieved with single-molecule long read sequencers, but algorithmic strategies can overcome this read-length issue.

Finally, tumor testing is becoming possible, to identify specific variants in a multiclonal tumor that define drug-able targets. Technically, this is the most complex clinical scenario in which NGS and new analytics are being applied today. Tumors are heterogeneous and usually contaminated with normal tissue, so they must be sequenced very “deeply,” meaning with ultra-high redundancy, to see rare cellular species. In addition, tumors undergo many chromosomal and nucleotide aberrations that challenge the NGS technologies to perform with clinical-grade sensitivity and specificity. That being said, these challenges are surmountable, with deep expertise in the technology and analytics, and can provide utility today in assigning salvage therapies to end-stage patients. In order to make cancer drug selection truly effective and lifesaving, the healthcare community must become comfortable with molecular targeting and combination therapies at the time of diagnostic biopsy. This is no small feat when patients might be unique with respect to their predicted therapeutic regimen and the front-line standards of care might not be followed.

New technologies and very complex analysis paradigms, coupled with large-scale research driven by inexpensive pervasive sequencing and phenotyping of millions of individuals across the globe, is breathing life into diagnostics for common, chronic, non-communicable diseases.6 Diagnostic products based on this idea—the more risk factors, the higher the absolute risk of disease—were commercialized more than five years ago, but there was initial skepticism about these risk factors given that the identified genomic regions were often located far from genes and had no known function, despite validated and reproducible correlations with disease. As a result of the ENCODE study, the vast majority of these variants are now known to be located in functional regions of the genome and validate the market opportunity for providing risk for chronic disease. Several options exist for diagnostic companies and provider systems—and their customers—to take advantage of the cost-savings and improved patient outcomes that genetic testing using NGS and sophisticated analytics afford.

Dietrich A. Stephan, PhD, is founder, president and CEO of Silicon Valley Biosystems. Dr. Stephan is a geneticist and recognized leader in the field of personalized medicine. Prior to founding SV Bio, he was Executive Director of the Gene Partnership at Harvard Medical School and Children’s Hospital Boston, where he guided a groundbreaking genomic medicine pediatric program.

References

  1. Lander ES, Linton LM, Birren B, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409(6822):860-921.
  2. Venter JC, Adams MD, Myers EW, et al. Sequence of the human genome. Science. 2001;291(5507):1304-1351.
  3. International HapMap Consortium. A haplotype map of the human genome. Nature. 2005;437(7063):1299-1320.
  4. Dunham I, Kundaje A, Aldred SF, et al. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489(7414):57-74.
  5. Proffitt A. Batten Disease finding ends a diagnostic odyssey for California family. Bio-IT World, Jan 24, 2013. http://www.bio-itworld.com/2013/1/24/batten-disease-finding-ends-diagnostic-odyssey-california-family.html. Accessed May 3, 2013.
  6. Ghadar F, Sviokla J, Stephan DA. Why life science needs its own Silicon Valley. Harv Bus Rev. 2012;90(7-8):25-27.