Emerging viruses – The molecular biology behind what, how, when and why

March 25, 2020

The Covid-19 coronavirus, which emerged at the beginning of 2020 and gripped media attention as it swept across the world, was a new actor but playing a recurring role – SARS (2002), “swine flu” (pandemic H1N1/09, 2009) and MERS (2012) are just three recent examples. These four all share several characteristics in common:

  • They’re likely of animal origin;
  • They have RNA genomes;
  • They infect the respiratory system; and
  • They have relatively high mortality rates compared to closely related viruses, which have been circulating in human populations for a longer time (consider for instance coronaviruses HKU1, NL63, OC43, and 229E).

None of this is coincidental; these attributes arise from basic considerations of molecular and evolutionary biology combined with statistics. In this month’s episode, we’re going to look at why that is (and what it might mean for future vigilance).


For all of their simplicity – a miniscule nucleic acid genome coding for a few key proteins, covered with a protein capsid and in some cases a lipid envelope – viruses are actually quite biologically complex. The tiny genome size means there’s no space to be “wasted” as more complex organisms do, with things such as pseudogenes slowly evolving over vast time scales to create new biological functionalities of potential benefit for the organism. The virus that will outcompete its siblings is the one that can replicate and disseminate the fastest, to the most new hosts, with the most efficient use of host cell resources. This means the smallest, fastest replicating genomes “win” the evolutionary battle.

The virus that will be successful codes for a minimal set of proteins providing functions not available from the host cell; these must interact correctly with both the viral genome and host cell machinery. This requirement for interaction is not trivial, and it’s why if one were to randomly take genes for different core functions from different viruses – a capsid protein here, an RNA-dependant RNA polymerase (RDRP) there, a transcription factor from somewhere else – and engineer them together, you wouldn’t get a working virus. The capsid protein wouldn’t bind properly to the genome, the RDRP wouldn’t find its appropriate regulatory signals to drive genome replication, and/or the transcription factor would activate host genes unhelpful to the viral life cycle.

Viruses must evolve as replicatively functional entities within a host cell, by an iterative process of mutation and selection. Simply put, novel viruses don’t pop into existence fully formed. Nor, in most cases, would engineering a gene from one virus into the genome of an unrelated virus yield anything useful. (So much for the Covid-19 conspiracy theories this author was asked about at the beginning of the outbreak.) Note the word “unrelated” here; swapping of gene functions or forms between closely related viruses is a different matter. Influenza A, for example, uses a multipartite genome with each segment containing one gene function; reassortment of these leads to novel but complete genome combinations with reasonable likelihood of the parts all working together.

So, if new viruses don’t just materialize, where do they come from? Most likely, from long-standing evolution in one host organism, followed by some small change that allows crossing the species barrier to a new host organism; a zoonotic origin, to use its proper term. In order to have the opportunity to make the species jump, humans have to interact with reasonable proximity to the host of origin; and the more new species humans interact with, the more new reservoirs of possible novel pathogens we as a species are exposed to. Emerging viruses are generally zoonotic in origin, and from a host species we haven’t had a long and close association with.

Why RNA genomes?

The four examples we began with here all have RNA genomes. While there’s nothing that says DNA-based viruses can’t be emerging pathogens (they absolutely can be), there’s a simple reason rooted in molecular biology as to why RNA viruses emerge more frequently. DNA replication – as carried out by most DNA polymerases – is what’s known as “proofreading.”

As an incoming free nucleotide is base paired to template and then added to the growing copy strand, transient changes in its molecular form (tautomeric shifts of hydrogen atoms and double bonds) can allow for mispairing and thus, misincorporation of the “wrong” base. Proofreading polymerases can sense this in almost all cases, and have a specialized “backwards” exonuclease function, which then acts to hydrolyze off this wrong residue, and proceed anew to incorporate the right residue. DNA replication by proofreading polymerases is thus highly accurate.

RDRPs do not have this function, and thus, it’s simple physics and statistics that RNA virus genome replication is error prone. From the standpoint of evolutionary biology, this is advantageous to the virus; when a single one gets into a host cell, it replicates not just exact copies of itself, but what’s referred to as a “quasispecies swarm” of virus progeny, which, in effect, sample a large amount of genetic diversity centered around the parent. This allows for rapid selection of adaptive mutations to the current host cell, on a time frame much faster than a DNA virus with its more highly conserved replication method. The rapid sampling of genetic space allows RNA viruses to adapt from one host cell environment to another – such as in another species - more quickly than DNA viruses. This statistically favors more frequent emergence of RNA viruses.

Why are they so often respiratory pathogens?

That’s easy to understand. Among gregarious social animals (an accurate description for the majority of humans), it seems evident that any one individual will have more contacts in “breathing proximity” – the sort of distance fomite droplets from a sneeze or cough can travel – than in direct physical contact (touching, biting, sexual or otherwise).

Respiratory tissues are topologically an exterior surface, not requiring breakdown of skin barrier for infecting particles to access. That infection of the respiratory tract leads to responses such as mucous secretion and coughing, and naturally provides an easy route of transmission onward from an infected host. Viruses could emerge with any target organ system, but respiratory organs are accessible, provide their own means of onward dissemination, and afford the largest number of possible host contacts among transmission methods.

Why are emerging viruses often associated with high mortality rates?

To answer this, we must again consider evolutionary biology. The most “successful” virus is the one that can make the most copies of its genome in its quest for world domination. (Viruses aren’t even classified as living let alone sentient and having evil master plans; this is however the simple biological imperative of every gene as most clearly espoused by Richard Dawkins’ phrase and book title “The Selfish Gene”). To achieve this goal, a virus would “like” its host to produce the maximal number of viral progeny and spread it onward through the largest possible number of new naive host contacts. This requires making the host sick (see ‘coughing and sneezing’, above), but hopefully not too sick – the host that feels good enough to move around its society spreads the virus more.

Similarly, the host that dies off is a dead end from the viral prospective; it’s much better to keep a host alive longer to make more contacts. This optimal balance between making the host sick and keeping them healthy enough to maximize transmission is a challenging one to achieve, which takes years of selection and incrementally improved dissemination of the best adapted pathogen form to establish. (Although not a virus, a well publicized example of this the reader may be familiar with is Treponema pallidum, aka syphilis).

The flip side of this is that when a virus jumps the species barrier to a new host species, it hasn’t had the time to adapt. In the immediate early stages of spreading through a new host species, prior to the existence of any herd immunity, a short term high-yield replication strategy is likely beneficial to maximize dissemination. Selective pressure on switching to a longer-term, better host-adapted, less-virulent pathogen form will occur later and over considerable time. Although it’s not always the case, newly emerged pathogens are often more virulent than ones that have had time to adapt and (co-)evolve with host species.

Implications for the future?

If you’ve been keeping score and doing some simple math, there’s approximately one new significant respiratory pathogen emerging every 4.5 years – not a number with much statistical validity, but at least a crude estimate. It’s far from a rare occurrence, in any case.

Factors that are likely to increase risk of these events include increased numbers of contacts between people and novel species and environments; improved transportation and human movement to and from these interaction sites (a species jump to a human who then doesn’t see other people or leave some remote site, isn’t the basis for a new pandemic); lack of observational vigilance on population health statistics (is there evidence of a symptomatic outbreak in one location, and is the etiological agent known?); and lack of reporting such suspicious observations onward promptly. There are wide disparities in how much each of these risk factors exist between different geographic, political and socioeconomic regions; unfortunately, most of the human/novel environment interactions tend to occur in places where these risks are least contained.

Study of viral species in animal populations is of some use in creating basic knowledge, but arguably little direct use in preventing novel viral emergence. It has little to no predictive value on where the next, rare, successful species jump will occur, nor is the virus studied in the animal host absolutely identical in its makeup or behavior to the variant that makes this leap. It may be nearly asymptomatic in its reservoir species yet have high mortality in humans. Each emergence is its own unique challenge to human health on a global, species-wide scale.

Containment methods, such as temperature screening airline travellers and denying travel to those with evidence of infection, are currently only employed in limited locations and during outbreaks. They’re also of less than perfect utility, as people being screened may be in early incubation stages of an infection. In fact, in many viral diseases of the past (and presumably the future), some members of the population may be asymptomatic spreaders, ones with little to no symptoms even as they shed infectious virus.

This lack of absolute ability to stop disease spread – coupled to obvious issues of cost, complexity and nuisance – all point to impracticality and futility of enacting such screening as routine practice at all times. One approach might work in theory; that would be the whole-genome screening of all international travelers with an algorithm that discards all human DNA and looks for evidence of novel (or specific known, targeted) pathogens. For this to be feasible in a socially acceptable manner at a cost and speed making it practicable, a score of ethical and technical hurdles would have to be surmounted. Such an approach is nowhere in the near future, if at all.

If there’s a final lesson from all of this, it’s that we should expect the emergence of novel pathogens without prior warning every few years. Each will be different. Robust, well-funded public health surveillance, testing and reporting will offer an – and perhaps the only - adaptable response to these as and when they occur.