Jumping genes: Alu elements in human disease

July 23, 2019

There are probably few—if any—readers of this for whom the name Barbara McClintock doesn’t ring a bell. While all Nobel Prize laurates gain widespread recognition, in her case it was compounded by the uphill battle she’d faced for acceptance of her work. A cytogeneticist working on corn as a model system, she had come to the conclusion that not all genes were static fixed loci at defined points in the genome. To say that her conclusion there were “jumping genes,”—coding DNA elements capable of moving from one chromosomal location to another—was met with generalized disbelief is a polite understatement. Time and weight of data proved her right and her 1983 Nobel in Physiology or Medicine, at the age of 81, was as much a testament to her perseverance as it was to good science.

The DNA features she discovered are properly referred to as transposable elements or transposons. Structurally, they share a number of features similar to some types of viruses (retroviruses) and can in a way be thought of as akin to a virus, in that they can replicate themselves semi-autonomously by use of host cell machinery. Unlike true viruses though, transposons don’t leave the cell, and progeny simply move to a new genomic location where they take up residence. They are in effect the simplest example of what’s termed “the selfish gene,” a postulate that genetic elements merely seek to replicate themselves. While most have “chosen” to do this through cooperative association with other genes to create viable replicating organisms, transposons do this purely on their own behalf and more as a parasite on the host cell than as a productive component of a larger whole. Our interest in them today stems first from the fact that they’re not just limited to existing in corn but are in fact found in most organisms including humans, and second with regard to this rogue, every-gene-for-itself intracellular lifestyle.


Humans don’t just have one type of transposon—there’s actually a number of types which are loosely grouped based on their physical size into Long Interspersed Elements (LINES) and Short Interspersed Elements (SINES). As you’d expect, the larger these are physically, the more genetic information they can code for. The one known as LINE-1 at a size of ~6000 base pairs codes for two open reading frames (regions that can be transcribed to mRNA then translated to protein). One of these proteins has RNA binding activity but an unclear biological function; the second has endonuclease (DNA cutting) and reverse transcription (generation of DNA sequences based on RNA templates). Essentially after a LINE-1 element is transcribed (driven in part by transcription factor binding sites in its 5’ end), the expressed second protein makes cuts in the host DNA via its endonuclease function. It then makes a DNA copy of the LINE-1 full transcript via its reverse transcriptase function. This DNA copy gets inserted into the cut host genome and host cell DNA repair machinery ligates this into place. The host chromosome has now gained a new copy of LINE-1 and every subsequent cellular replication cycle replicates this as part of its “normal, innate” nuclear DNA. This is considered autonomous retrotransposition, as LINE-1 supplies its own key enzyme functions for the process. Although the process itself occurs rarely, it’s easy to see how over long biological timescales this can lead to accumulation of multiple replicate copies of the LINE-1 element. LINE-1 is thought to be the only fully autonomous transposable element of the human genome, and it’s proven an effective biological strategy with nearly 17 percent of the human genome made up of this sequence (roughly 170,000 copies per cell)!

Our focus today, however, is on a SINE, and in particular the one (really, the one family) known as Alu elements. Named after a restriction endonuclease site (Alu I) they characteristically contain, they’re much shorter than LINE-1, at only about 280 base pairs long. This means they haven’t got much coding capacity of their own beyond some transcriptional start signals and are thus not autonomous. In fact, Alu elements require both cellular factors and the second protein product of LINE-1 for their replication, being in a sense parasitic both to the host cell and to the LINE-1 elements. This parasite of a parasite approach is apparently an even more effective selfish gene strategy, as Alu elements make up about 11 percent of the human genome (about 2 million copies per cell).

Biological impacts

Not surprisingly, there are some very real impacts from having so many genetic freeloaders in our genome—and unstable ones at that. Particularly through the transcriptional and other genetic signals they carry, an Alu element may influence many aspects of proximal host gene expression including basal gene expression levels, intron splicing and polyadenylation, and RNA editing. Evolutionary pressure on the cell as a whole would generally lead to host genome adaptation to these to accommodate, compensate, or perhaps in some cases even derive a benefit from impact of a particular Alu element in context. Such host adaptations take time however, and clinical pathologies can arise when a novel Alu transposition event occurs leading to an abrupt genetic change at what’s essentially a random loci—the insertion of a new Alu copy.

Some things to know about this are that since it’s a transcription (RNA) initiated replication process, replication is error prone. Unlike DNA polymerases, many of which contain what’s called a proofreading function whereby each nucleotide added to the nascent template copy are subjected to a second look to confirm a true complementary match as opposed to one based on a transient tautomeric shift, RNA polymerases are biologically optimized for speed and processivity. Once a nucleotide is added to a growing transcript the polymerase rushes ahead to the next base. Since a proportion of all of the bases making up DNA and RNA can and do exist in tautomeric forms where there are brief rearrangements of hydrogens and double bonds compared to the forms we see in textbooks, RNA transcripts tend to have low but significant mis-copy rates from their DNA template.

I sense some readers suddenly panicking, why if this is so, aren’t we all a mess due to errors in regular mRNA transcripts? It’s because we make multiple copies of transcripts from active genes, and on average they’re OK. Whether they’re OK or not, they have a short life before degradation and replacement with new transcripts as needed. Rare sporadic errors in mRNAs are thus not likely to be of significance.

If, however you now take this not-quite-perfect RNA copy of a DNA, then reverse transcribe it back into DNA for long term propagation, you’ve now fixed that genetic change for the long term. A consequence of this is that only a small proportion of the Alu elements in our genes are actually competent to replicate and insert new copies of themselves. In all, it’s estimated that there’s only about one novel Alu insertion. That’s a very good thing, because these insertional events are potentially problematic.

Recall that about one percent or a bit more of the human genome is coding for host proteins (roughly 21,000 genes). If we go making cuts and stuffing unrelated DNA willy-nilly in the genome, it stands to reason that about one percent of these would be in genes and the result would be an insertional inactivation of the gene. Because the Alu element carries transcriptional signals and potentially other regulatory elements, it’s also quite possible for it to exert unwanted influences on gene expression of things it’s merely near to. In either case the result is dysregulation of a gene or genes, almost certainly with deleterious results.

Another aside, exactly this process is used in some model organisms to identify genes relating to a phenotypic trait. Simplistically, transposons endogenous to the organism can be encouraged to activate, and progeny organisms with changes to phenotype of interest are examined for any new transposon insertion sites on the assumption they may be in or near genes related to the phenotype. It’s called transposon tagging. 

Besides novel retrotransposition events causing insertional inactivation, the high total number of Alu elements in and of itself can lead to other genetic problems. Specifically, these local islands of sequence similarity can be points for unequal homologous recombination events, where the chromosomal context around each Alu element isn’t the same. These can occur both extrachromosomally (leading to exchange of nonhomologous chromosomal segments) and intrachromosomally (where they tend to lead to deletion or duplication of regions, depending on whether the two Alu elements are in same or inverse polarity orientations).  

Real-life examples

So now that we’ve covered the theory that there really are mobile genetic elements in humans, they sometimes activate and insert new copies of themselves, and that can have bad consequences for the cell—what about real life examples? Do people show up in clinical settings with problems attributable to novel Alu insertions? Absolutely; as far back as 19991 it was estimated that novel Alu insertions were detectable in approximately one of every 200 live births, and were responsible for 0.1 percent of known genetic disorders. Particular reports from the literature include spontaneous occurrences of hemophilia;2-4 Apert syndrome;5 neurofibromatosis Type 1;6 and optic atrophy.7 Readers looking for a longer list are directed to a review from 2012 and its references, listed as reference eight below.

Clinical presentations relating to Alu-influenced recombinational events are likely harder to identify with certainty than ones from insertional events, but cases have been reported (see reference nine for an example) and are likely more frequent than we know.

From a treatment perspective, each Alu induced mutation—insertional or recombinational—is unique and treatment (if any) would likely have to relate to direct biochemical intervention in impacted pathway(s) where possible, or perhaps genetic engineering tools as envisioned in other innate genetic disorders. They therefore remain for the clinician rather a curiosity than a type of condition with a common treatment or prevention—but likely one of not insignificant frequency at the root of novel genetic presentation.


  1. Alu Repeats and Human Disease. Deininger P, Batzer M. Molecular Genetics and Metabolism 1999; 67(3):183-193.
  2. An Alu insert as the cause of a severe form of hemophilia A. Sukarova E, Dimovski AJ, Tchacarova P, et al. Acta Haematol. 2001;106(3):126-9.
  3. Haemophilia B due to a de novo insertion of a human-specific Alu subfamily member within the coding region of the factor IX gene. Vidaud D, Vidaud M, Bahnak BR, et al. European Journal of Human Genetics 1993; 1(1):30-36.
  4. Exon skipping caused by an intronic insertion of a young Alu Yb9 element leads to severe hemophilia A. Ganguly A, Dunbar T, Chen P, et al. Human Genetics 2003; 113(4); 348-352.
  5. De novo Alu-element insertions in FGFR2 identify a distinct pathological basis for Apert syndrome. Oldridge M, Zackai EH, McDonald-McGinn DM, et al. American Journal of Human Genetics 1999; 64(2);446-461. 
  6. A de novo Alu insertion results in neurofibromatosis type 1. Wallace MR, Andersen LB, Saulino AM, et al. Nature 1991; 353(6347); 864-866.
  7. Alu-element insertion in an OPA1 intron sequence associated with autosomal dominant optic atrophy. Gallus GN, Cardaioli E, Rufa A, et al. Molecular Vision 2010; 16; 178-183. 
  8. Alu Mobile Elements: From Junk DNA to Genomic Gems. Dridi S. Scientifica 2012. Article ID 545328, 11 pages. 
  9. Mutation in LDL Receptor: Alu-Alu Recombination Deletes Exons Encoding Transmembrane and Cytoplasmic Domains. Lehrman MA, Schneider WJ, Südhof TC, et al. Science 1985; 227(4683); 140–146.