The basic premise of the Human Genome Project, and now the increasingly affordable and technically viable whole genome sequencing (WGS) technologies, is straightforward: if the DNA contains the entire genetic code for an organism (or person), then knowing the entire genomic sequence should lay bare their entire clinical genetic picture. When tied to suitable bioinformatics consisting of reference sequences and linked metadata such as sequence variation effects on enzyme kinetics, this would in theory enable a clinician to tailor drug selection and dosing to someone’s genetics in order to maximize efficacy and reduce unwanted side effects. Similarly, mining this trove of genetic information might be of use in genetic counseling contexts such as infertility.
This sort of application of WGS data must rely on large pools of reference data for its interpretation. For monogenic traits—phenotypic expressions clearly based on a single gene—smaller data sets are needed than for complex polygenic traits, where an interplay of multiple gene alleles leads to phenotype. Teasing out the interactions of all genes contributing to a polygenic trait, and the impact of different allele combinations, can only come over time and with ever-increasing reference data sets.
The collection, annotation, and publication of this sort of data is happening at an increasing rate as technical and price hurdles drop, and there is every reason to believe that this potential for personalized genetic medicine will come closer and closer to realization in the next few years. In this month’s installment of The Primer, though, we’re going to address one of the challenges to this straightforward concept, and thereby underscore the fact that there remain confounding factors—ripe fields for further research that will help us more fully understand how the genotype ends up as the phenotype.
Breakdown in the Central Dogma! (News at 11)
The particular confounding factor we’ll examine here strikes right at the heart of what’s referred to as the Central Dogma of molecular biology. While not all of you may recall that term from the classroom, I am sure you’re all familiar with the underlying immutable concept it describes. That is, DNA genomes get transcribed to RNA transcripts, which get translated to polypeptide proteins, which in turn build the cell, organs, and organisms (directly as structural elements or through enzymatic and regulatory functions for non-protein components). Many of you may remember the triplet code “lookup tables” from some textbook, telling you how each of the 64 possible three-base DNA genome elements codes for one of the 20 amino acids, or one of the three STOP codons: TTT is phenylalanine, ATC is isoleucine, and so on. Actually, most of these tables are based on the RNA sequence, which would give those two examples as UUU and AUC, but some texts give them for the DNA form to make it easier to just read the genomic coding sequence of an open reading frame and know the resulting amino acid sequence.
Therein lies today’s heresy. Put simply, the RNA-based codon table is true, and the DNA-based one is too, but if you sequence genomic DNA and read from the DNA codon table you may not get the right answer. That’s right, there’s a breakdown in the Central Dogma; the DNA sequence one finds, following translation and processing, may not always be the same as the RNA sequence that gets to the ribosome and directs production of a protein—and we’re talking about exons here, never mind introns and alternative splicing. (We’ll briefly revisit that below.)
The culprit here is a process called RNA editing. As the name implies, this is a cellular process whereby individual nucleotides in a primary RNA transcript can be altered. Several forms of this are known, but the most common is conversion of adenosines to inosines (an A to I transition). This process occurs in a specific and reproducible manner, under the control of a class of enzymes known as adenosine deaminases (ADAR). By changing an A to an I within a codon, its “readout” (translation to amino acid) within the ribosome can be altered, leading to missense mutations (substitution of one amino acid for another), nonsense mutations (substitution of an amino acid for a STOP codon), or STOP suppressions (substitution of a STOP condon for a sense codon, leading to a product with some random C terminal tail). Each of these can of course be deleterious, but in other instances, proper functional translation of the gene may require the RNA editing event. On top of all the other cellular mechanisms for modulation of expression levels and post-translational modifications, it’s yet another way to exert epigenetic influence over the end result of what looks like a fixed element, the DNA coded gene. What genomics has discovered is that it’s not so fixed.
What RNA editing does, and why we care
After the phenomenon of RNA editing came to light a few years ago, advances have been made in using NGS technology to start identifying some of these sites in the human genome where this process occurs. Far from being rare, over one million such loci were reported with high confidence by 2014, and the number continues to grow. It’s notable, however, that many—in fact, most—of these editing sites do not occur in coding sequences, but in noncoding transcribed regions. It’s also notable that while the editing sites are reproducible, not all transcripts will generally undergo editing at the site; usually, only a small fraction of transcripts will be edited. What, then, is the net biological effect of RNA editing, and more importantly, do we need to concern ourselves with it?
Preliminary data from a number of sources suggested that modifications in normal RNA editing might play a part in tumor biology. Han et al1 sought to address this in more detail, with an in-depth examination of approximately 1.4 million RNA editing sites as compared between putatively normal tissue and several different tumor types. In a nutshell, this study reports observing individual cancer types being associated with instances of both over-editing (higher frequency of A to I conversion) and under-editing (lower than normal rates of editing) at distinct potential editing sites. The researchers assessed these further for signs of clinical relevance and found evidence that differential editing at these sites (and presence of the corresponding changed polypeptide products) contributed to enhanced tumor cell survival rates compared to wild type cells. Similarly, changes in RNA editing were observed to correlate with altered (decreased) sensitivity to chemotherapeutic agents. Clearly then, if nothing else, these edited sites may be of value as biomarkers in selecting therapies and dosages in oncology settings.
Other studies in non-cancer settings have also suggested that RNA editing is an important cellular activity, not to be ignored. For example, it now appears that mutations in ADAR1 that reduce RNA editing are the root cause of Aicardi-Goutières syndrome, a rare pediatric illness which otherwise resembles viral infection or severe systemic lupus erythematosus (SLE), and mutations in ADAR2 have been linked to a number of neurological disorders.2
Implications for WGS and WES
A clear take-away message from this is that WGS (DNA) data must be taken with a grain of salt, as it were. The DNA we see in a sequence may not be completely indicative of the final gene product. Of course, clinical sequencing applications already have to deal with this in ways such as alternative splicing (where differential selection of exons can lead to multiple mRNA and protein isoforms of a single gene) as well as various effects on expression levels. Instead of using WGS, these changes as well as RNA editing can all be captured through whole exome sequencing (WES), which relies on capturing and sequencing RNA transcripts via cDNA rather than the underlying genomic DNA.
Before the reader breathes too much of a sigh of relief that we’ve neatly sidestepped this problem, though, stop to consider that the RNA transcriptional pool will (must!) differ between samples from different tissues and/or environmental conditions even in the same organism (or patient, as the case may be). Tissue-specific sampling for production of WES cDNA samples is therefore important, and no one such sample would be expected to provide a comprehensive genetic snapshot of a person.
So how will the genetic wonder lab of the future, where a person’s entire innate and microbial genetics is unlocked in mere hours for perusal by the clinician and use in personalized medicine, work? At this juncture it seems likely that both WGS and WES data (the latter, likely from multiple specimens) will be needed to approach this utopian goal. Certainly, WGS alone does not give the full picture. What you se(quence) isn’t always what you get.
- Han L, Diao L, Yu S, et al. The genomic landscape and clinical relevance of A-to-I RNA editing in human cancers. Cancer Cell. 2015;28(4):515–528.
- Gallo A, Vukic D, Michalik D, O’Connell MA, Keegan LP. ADAR RNA editing in human disease; more to it than meets the I. Hum Genet. 2017;136(9):1265-1278.
John Brunstein, PhD, is a member of the MLO Editorial Advisory Board. He serves as President and Chief Science Officer for British Columbia-based PathoID, Inc., which provides consulting for development and validation of molecular assays.