The recent surge in number of sequencing-based clinical genetic tests has put a spotlight on associated challenges in data interpretation. While advances in genomics allow for the development of new genetic tests at a breathtaking pace and with unprecedented complexity, the interpretation of results has remained a largely manual, time-consuming process that is simply not scalable. In this article, we review the current landscape of variant interpretation and the challenges it presents, as well as new developments in the field that indicate significant improvements may be on the horizon.
Today, variant interpretation is conducted by clinical geneticists who have tremendous skill and expertise in their field. It is a testament to their dedication that current interpretations are as reliable as they are. However, this dependence on human judgment, coupled with a laborious process, introduces room for error.
As lab directors know all too well, most variant analysis follows the same formula: Run the genetic test, annotate results, investigate variants detected, weigh evidence, integrate and interpret data, and report final results. It’s the middle parts of the process—variant investigation and interpretation—that prevent this process from becoming fully automated and scalable.
Clinical geneticists usually begin this variant interpretation journey with an annotated report from the DNA results of the test, whether that’s a gene panel, exome, or even whole genome. This annotation includes a list of identified high-quality variants that must be pursued to determine which, if any, is causative for disease. Variant interpreters often begin with PubMed and Google, scouring the literature to find mentions of these variants. Next, they have to go through each paper to figure out whether its information about the variant is relevant to the test at hand, tracking details about heterozygosity, disease type, number of subjects, and so forth. Once any available information about the variants has been uncovered, the next stop is databases or websites that predict protein changes based on the DNA variant. This part of the process indicates whether the variant might be affecting a patient’s phenotype. With all of this information, the analysis team draws on its deep clinical expertise to make a judgment call about how to report each variant on the list: pathogenic or likely pathogenic, benign or likely benign, or significance unknown.
Experts in this process say that this interpretation process takes about 30 minutes for each novel variant, a couple of hours for variants that have been reported in the literature, and many days for the most complicated cases. As genetic tests become increasingly complex, covering more and more of a patient’s genome, the time spent analyzing a growing list of variants for each test is expanding drastically. Challenges in efficient and effective interpretation of genetic test data will soon gate our ability to bring these benefits to patients, motivating the need for robust clinical decision support solutions directed at these clinical testing labs.
Whether a genetic test is trying to pinpoint the cause of a rare or unknown disease, find evidence of hereditary disease, or suggest an appropriate treatment course, the need for rapid turnaround of results is imperative. Clinical geneticists are well aware of this, but the growing demand for genetic testing and the increasing number of variants turned up by these tests are doubly burdensome for analysis teams. As the range of testing options soars, most clinical labs no longer have a geneticist with expertise in every test indication; under these conditions, variant interpretation may take even longer.
In order to ensure that results are returned to physicians quickly enough to be clinically useful, it is essential to find ways to automate and streamline as much of the process as possible so that clinical geneticists deploy their expertise where it’s needed most. For instance, many large clinical labs maintain their own variant databases, so if a variant has been seen and interpreted before, analysts can avoid the time-consuming process of researching it all over again. The development of proprietary databases by testing labs and healthcare providers that consume the data is a key emerging trend. Value is being derived mining these databases to determine variant frequencies and their associations with clinical profiles, outcomes, and ethnicities to enhance the value of clinical reports.
Another major challenge lies in the murky category of variants of unknown significance. Obviously, clinical utility is greatest when a variant falls firmly into either the “pathogenic” or “benign” category, with utility weakening as the variant moves toward the center of the spectrum. However, emerging needs are suggesting that variants be classified into further subcategories such as “likely pathogenic” or “likely benign.” But in many cases, the downstream effect of variants—even those already reported in the literature—is not clear. Because many of these variants must be interpreted with minimal information, it comes as no surprise that variant interpretations can differ significantly from one lab to another.
In combination, these numerous secular drivers of inefficiency, together with the need for testing labs to expand their test menu, drive operational efficiencies and turnaround time. The community needs better resources that will help definitively classify variants, pulling more of them out of the “unknown” category and increasing the likelihood of having consistent interpretation across labs. In fact, lack of commercial grade information solutions is compounding these inefficiencies and moderating the development of the market.
Signs of improvement
There are a number of reasons to believe that the genomics and clinical communities are well on their way to addressing these challenges. For example, ClinVar1 is a publicly available database hosted by the National Center for Biotechnology Information. Users submit genetic variations and their associated phenotype, with supporting evidence, to help others in the community increase their confidence in their own variant interpretations. Other commercial efforts perform large scale curation and data integration from clinical literature and other sources to power clinical test interpretation pipelines.
Meanwhile, the Clinical Sequencing Exploratory Research2 (CSER) program, funded by the National Cancer Institute and National Human Genome Research Institute, has established a consortium of laboratories conducting studies that are helping to explain variant classification differences among labs. This data will be quite useful for suggesting standardized approaches to improve lab-to-lab reproducibility of results.
Separately, the Allele Frequency Community3 (AFC) was founded by a number of organizations to share aggregated information about how often variants are seen in various populations, allowing analysts to factor in data for groups that may be underrepresented in existing public databases, including the Exome Aggregation Consortium4 (ExAC). The AFC operates on a share-and-share-alike model, so all members increase the value of the repository by contributing their own data about allele frequency.
Clearly, the challenge of variant analysis and interpretation affects the entire clinical community, and it will take a community-wide effort to overcome this obstacle. But we believe that clearing this hurdle is possible, and that eventually the variant interpretation process will be faster, more automated, and more definitive, giving clinical geneticists an even greater role in patient diagnosis and care.
The trend is expected to continue and drive the use of high-complexity genetic tests toward an industrialized scale. Consistent with industrialized technology markets, the demand for software solutions will shift from open source and homebrew solutions to highly scalable commercial-grade informatics solutions that are enabled with rigorously curated knowledge bases.
- ClinVar: National Center for Biotechnology Information. http://www.ncbi.nlm.nih.gov/clinvar/. Accessed December 30, 2015.
- CSER: Clinical Sequencing Exploratory Research. National Institutes of Health. Moving the genome into the clinic. https://cser-consortium.org/. Accessed December 30, 2015.
- Allele Frequency Community. Imagine human genome interpretation…minus the false positives. http://www.allelefrequencycommunity.org/. Accessed December 2015.
- ExAC Browser (Beta) | Exome Aggregation Consortium (ExAC):