Sars-Cov-2 Evolution and the Potential Impact of Mutations
The data suggests a certain variability within the SARS-CoV-2 genome, but direct evidence of actual functional adaptation is yet to be presented. With increasing amounts of genomic information available, the impact of mutations can be studied in more detail in future and potential correlations between genotype and epidemiology resolved. Requiring shorter timeframes than traditional vaccine platforms for development and realization, the attenuated viral vaccine approach is well suited to providing solutions even in the face of potential functional adaptations of the SARS-CoV-2 genome.
As the global population becomes increasingly affected by the impact of the COVID-19 pandemic, scientists are continuously working on gaining an improved understanding of the viruses’ evolution and epidemiology. Whilst the exact genetic origin of the SARS-CoV-2 virus is still under debate to some extent, the validity of applying knowledge gained from the study of other coronavirus strains is being called into question by some scientists. Accumulating mutations in the viruses’ genome are discovered as more data becomes available, showing the potential for diversification and evolutionary adaptation of the SARS-CoV-2 in the human population.
SARS-CoV-2 belongs to a group of 40 Severe Acute Respiratory Syndrome (SARS)-like coronaviruses (genus Betacoronavirus) containing a linear single-stranded positive-sense RNA genome. The approximately 30 000 nucleotide genome comprises 14 open reading frames (ORFs), predicted to encode 27 proteins. These include replicase, ORF1ab (encoding sixteen non-structural proteins (nsps)), S (spike glycoprotein), ORF3a (ORF3a protein), E (structural envelope protein), M (structural membrane glycoprotein), ORF6 (ORF6 protein), ORF7a (ORF7a protein), ORF7b (ORF7b protein), ORF8 (ORF8 protein), N (structural nucleocapsid phosphoprotein), and ORF10 (ORF10 protein) 1,2.
Although coronaviruses are unusual for RNA viruses in that they contain an RNA-proofreading mechanism and are thought to be genetically stable, significant single nucleotide polymorphisms, or single base mutations, have been found in several of the open reading frames, notably in the ORF1a and b, ORF3a, ORF8 and in the S protein, which is an important determinant of antigenicity and pathogenicity.
After investigations and comparative studies of the genome sequence of the SARS-CoV-2 virus, it is assumed that the population studied thus far exhibited a generally low heterogeneity and was evolving at an expectedly moderate pace for coronaviruses of 1.15 x10-3 substitutions/site/year. Interrogations of 56 and 95 full-length SARS-CoV-2 genomes showed low variability with a sequence identity of over 99% 3,4, indicating that the virus represents a relatively genetically stable species after acquiring the ability to infect and spread among human beings.
This idea, however, is debated by some scientists and potentially underestimates the evolutionary selective pressure that has been placed on the genotype even shortly after the introduction to the human population. As the genetic variability of strains in separate geographic regions is compared, it appears that the SARS-CoV-2 genome contains abundant single nucleotide polymorphisms (SNPs), representing significant basic mutational diversity upon which natural selection can act. This mutational diversity has important implications for disease management as well as treatment and vaccine development 5.
A peer-reviewed study published in February of 2020 6 suggested that an overall difference in genome sequences of 4% observed between the SARS-CoV-2 and a bat derived SARS strain (RaTG13 – believed to be ancestral to SARD-CoV-2) represented an underestimation of the actual evolutionary divergence of SARS-CoV-2. They presented data showing that different types of mutations are under differential selective pressure and accumulate at disparate rates.
The two basic types of mutations observed during viral replication are either a base change in the genetic sequence that leads to a modification in amino acid sequence (missense mutation or nonsynonymous substitution) and cause a change in the characteristics and function of the resultant protein, or a ‘neutral’ or synonymous substitution in which the variation in the genetic code does not change the amino acid sequence. This latter change does not lead to any change in protein structure or function. This is due to the degeneracy of the genetic code, meaning that several different genetic codons can lead to the same amino acid sequence and thus giving rise to an identical protein.
According to Tang et al., nonsynonymous, or change-inducing mutations are subject to more stabilizing selection pressure to maintain the functionality of the virus, especially in viruses with proof-reading mechanisms. They found that 87.6% to 95.6% of such mutations were removed during viral evolution. Mutation frequency in such locations was observed to be much higher than neutral, or silent mutations, at up to 17% difference compared to the overall 4%. To accurately evaluate the viral mutation rate, the researchers suggested that the different types of mutations needed to be considered separately in order not to underestimate the frequency of base changes taking place.
Further studies published throughout the beginning of 2020 shed more light on mutational hotspots in the SARS-CoV-2 genome and discuss potential functional consequences.
S (Spike) Protein – Receptor Binding Domain
The spike protein of the virus is responsible for binding to the angiotensin-converting enzyme 2 (ACE2) receptor found in human airway epithelia and lung parenchyma and contains important antigenic epitopes for recognition by the human immune system 7. It has been shown that the spike protein of the SARS-CoV-2 virus has a 10 – 20 fold higher affinity for the human ACE2 receptor than the spike protein of SARS-CoV, increasing its infectivity 8.
Also, current infection and immunity testing are based on the Spike sequence from the original strain isolated in Wuhan 8. A peer-reviewed study published in February of 2020 found three sites located on the receptor-binding domain (RBD) of the viral spike (S) protein to show characteristics of positive or diversifying selection, over time potentially leading to diversification of several disparate viral lineages varying in their virulence 6.
A study published on March 17 7 analyzed 1609 whole-genome sequences from the public domain and found 32 mutations in the S protein receptor-binding domain (RBD) which clustered into 10 types that appeared to be under positive selection pressure. They performed in silico molecular dynamics simulation to assess the binding affinity of the mutant types to ACE2. Three mutation types (derived from strains collected from patients in France/HongKong, Wuhan, and Shenzhen) exhibited an increased binding affinity to the human receptor, possibly implying a higher infection potential among humans. Structurally, all mutations exhibiting higher binding affinities were located near a region of amino acids (residues 510-524) in the RBD which stabilizes the three-dimensional shape of the area directly interacting with the ACE2 surface. In this area, the addition of a positively charged Arginine is inferred to complement binding to host cells through better interaction with the highly negatively charged ACE2 surface.
In a more recent pre-print, Chinese researchers again found several mutations in S protein sequences after comparing 11 patient-derived isolates from Hangzhou, Zhejiang Province, China (collected between 19 January and 5 March) with 1111 SARS-CoV-2 sequences from GISAID 5. As described by Tang and colleagues 6, besides an abundance of mutations present at low frequencies, S protein RBD appears as one of the most variable parts of the genome and is assumed to be under positive selection as the virus is replicating within the human population.
Consistent with other reports 7, 33 mutations were found, containing 19 novel changes despite the recent transmission event from animals at the time of sampling. They also identified the same mutations in locations 8782 and 28144 in ORF1ab and ORF8, respectively,as founding mutations for other clades of viruses. Geographical clusters founded by strains with specific mutations were identified which correlated with in vitro infection studies of viral load and cell death in Vero-E6 cells. Strains found in Europe were associated with much higher viral loads, while the strain containing the ORF8 L84S mutation exhibited lower viral loads 5.
These infection studies were performed in Vero-E6 cells, which are primate epithelial kidney cells but contain highly similar ACE2 protein to human cells. Here, the cell death rate was related to viral loads in vitro, which led to the hypothesis that the geographical predominance of different variants could account for varying mortality rates across the globe. Some researchers have postulated viral loads to correlate with the severity of symptoms as high viral loads have often been documented in elderly patients 9. However, other theories have been put forward for the high mortality observed in the elderly as low viral loads have been observed as well 10.
In a pre-print published on 30 April 8, a real-time mutation tracking platform for Spike mutations was introduced by researchers from the Los Alamos National Laboratory. Mutations in the Spike protein are assessed in terms of showing evidence of being under positive selection pressure and subsequently their structure is modeled. They identified a mutation at position 23403, mostly found in combination with a silent mutation at position 3037 and one in ORF1a at position 14409. According to the researchers, the combination of these three mutations founded a clade in Europe during February and spread to represent the dominant local form after only a few weeks. It was subsequently introduced to Canada and the US in early March and replaced the original sequence by the end of March. The same pattern was evident for its spread in Australia. It was thus proposed that the mutation confers a selective advantage to the carrier virus over the original Wuhan strain. Structural modeling suggested the variation may improve receptor binding with ACE2 or may even make individuals who had been previously infected with the original strain more susceptible to acquire infection with this new strain than immunologically naïve individuals.
ORF1a – RNA-dependent RNA polymerase
Another group 11, identified the same three mutations as founding mutations for a distinct lineage circulating in Europe in a peer-reviewed paper from late April. They further analyzed the mutation found at position 14408 in ORF1a, which causes an amino acid substitution in RNA-dependent RNA polymerase (RdRp). This might be considered particularly significant as this protein is involved in proof-reading activities during genome replication. Strains carrying this mutation are dominant in Europe and contain a higher number of point mutations compared to genomes from Asia. Additionally, many drugs are targeted at the RdRp, and researchers hypothesized that significant mutations in this region may decrease drug binding and give rise to drug-resistant phenotypes.
ORF1, ORF3, ORF8
The analysis of 103 publicly available SARS-CoV-2 genomes revealed mutations in 149 sites overall, most of which are occurring only very infrequently across the sampled genomes. It also revealed, however, that the founding viral population diverged into separate groups which could be delineated by a set of two SNPs in open reading frames ORF1a at location 8782 (synonymous) and ORF8 at 28144 (non-synonymous) present simultaneously in the genome. The one lineage designated S had the codon at position 28144 partly coding for Serine, and a lineage L, with position 28144 partly coding for Leucine. The S lineage was found to be ancestral, but only present in 30% of the genomes analyzed, while the L strain was present at a frequency of 70%. This, together with a higher number of accumulated mutations in strain L, indicated that strain L exhibits a higher replication and transmission rate. These epidemiological features are thought to expose the two lineages to dissimilar selective pressures, possibly promoting diversification of separate viral strains in the future 6.
Another peer-reviewed study found 13 nucleotide variations in several ORFs in 95 full-genome sequences from NCBI and GISAID databases, among which the same positions in ORF1a (position 8782) and ORF8 (position 28144) showed a much increased variability of 30.53% and 29.47% respectively 4.
In a pre-print from 11 April, Velazques-Salinas and colleagues 2 identified locations on the viral genome that also appeared under diversifying positive selection, including ORF1ab, ORF8a, and ORF3a. They postulated that the virus diverged into three phylogenetic groups soon after acquiring the ability to infect humans through evolutionary selection. Early evolutionary events were hypothesized to lead to different variants of the SARS-CoV-2 virus circulating in the global population that may exhibit different levels of virulence.
Further studies have suggested the link between certain types of mutations and viral epidemiological features, including disease morbidity and mortality, but have not yet been peer-reviewed 12. The results of many of these studies have to be interpreted with caution to avoid unfounded speculations. Functional impacts of certain mutations have been investigated using various techniques, including in silico molecular dynamic simulations and in vitro effects of viral mutations in cell culture infection studies. None of these have, however, been tested in human beings or even animal models to verify the data, so resulting functional and epidemiological effects are speculation only. Despite the limitations imposed by such indirect approaches, they may indicate the potential impact of mutations independent of confounding variables typically present in clinical data.
Other scientists, such as Guo Deyin, who researches coronaviruses at Sun Yatsen University in Guangzhou, however, explains the SARS-CoV-2 genome to be very stable, stating not to have observed any pathogenicity changes caused by mutations 13. Slow mutation rates would indicate increased vaccine effectiveness, as well as the potential for antibodies raised to previous infections to remain protective for a long time. Generally, the mutational potential of the SARS-CoV-2 genome appears to still be under debate, and new evidence will have to be evaluated as it is presented.
There are, however, some indications that the SARS-CoV-2 genome is dynamic, and selective pressure might be apparent at certain antigenically important positions. This might have significant implications for vaccine development, much of which is based on inducing antibodies against the S protein 14. Potential S protein variability may weaken antibody binding and decrease the effectiveness of vaccines as well as immunity, as antibody cross-reactivity between dissimilar RBDs appears to be limited 15.
Evidence for recombination of separate lineages was found in sequences in Belgium, requiring one individual to be affected with two different variants. This provides the virus with the opportunity to combine the effect of independently evolved mutations 8.
On the other hand, genetic diversity present in the SARS-CoV-2 genome also increases the statistical likelihood of finding attenuated viral strains that have the potential to be used as attenuated vaccines or for challenge trails. Studies of other coronaviruses have supported the argument that natural attenuation through mutations would cause SARS-CoV-2 to eventually cause only mild upper respiratory tract symptoms akin to the common cold, rendering vaccines less important 16.
Strains containing genetic deletions that may be better suited to serve as attenuated vaccines have already been identified in databases. The attenuated viral vaccine approach has the advantage of shorter timeframes than what is expected for traditional vaccine development platforms. With increasing amounts of genomic data available, (COVID-19 Genomics UK Consortium (COG-UK) has released 10,567 complete SARS-CoV-2 genome sequences to date) the functional impact of mutations can be studied in more detail in future and potential correlations between genotype and epidemiology resolved. Overall, the data suggests a certain variability within the SARS-CoV-2 genome, but direct evidence of actual functional adaptation is yet to be presented.
1. Wu, F. et al. Complete genome characterisation of a novel coronavirus associated with severe human respiratory disease in Wuhan, China. bioRxiv 2020.01.24.919183 (2020). doi:10.1101/2020.01.24.919183
2. Velazquez-Salinas, L. et al. Positive selection of ORF3a and ORF8 genes drives the evolution of SARS-CoV-2 during the 2020 COVID-19 pandemic. bioRxiv 2020.04.10.035964 (2020). doi:10.1101/2020.04.10.035964
3. Ceraolo, C. & Giorgi, F. M. Genomic variance of the 2019‐nCoV coronavirus. J. Med. Virol. 92, 522–528 (2020).
4. Wang, C. et al. The establishment of reference sequence for SARS‐CoV‐2 and variation analysis. J. Med. Virol. 92, 667–674 (2020).
5. Yao, H. et al. Patient-derived mutations impact pathogenicity of SARS-CoV-2. medRxiv 2020.04.14.20060160 (2020). doi:10.1101/2020.04.14.20060160
6. Tang, X. et al. On the origin and continuing evolution of SARS-CoV-2. Natl. Sci. Rev. (2020). doi:10.1093/nsr/nwaa036
7. Ou, J. et al. RBD mutations from circulating SARS-CoV-2 strains enhance the structure stability and infectivity of the spike protein. bioRxiv 2020.03.15.991844 (2020). doi:10.1101/2020.03.15.991844
8. Korber, B. et al. Spike mutation pipeline reveals the emergence of a more transmissible form of SARS-CoV-2. bioRxiv 2020.04.29.069054 (2020). doi:10.1101/2020.04.29.069054
9. Chen, Y. & Li, L. SARS-CoV-2: virus dynamics and host response. Lancet Infect. Dis. 20, 515–516 (2020).
10. Blanco-Melo, D. et al. Imbalanced host response to SARS-CoV-2 drives development of COVID-19. Cell (2020). doi:10.1016/j.cell.2020.04.026
11. Pachetti, M. et al. Emerging SARS-CoV-2 mutation hot spots include a novel RNA-dependent-RNA polymerase variant. J. Transl. Med. 18, 179 (2020).
12. Banerjee, S., Dhar, S., Bhattacharjee, S. & Bhattacharjee, P. Decoding the lethal effect of SARS-CoV-2 (novel coronavirus) strains from global perspective: molecular pathogenesis and evolutionary divergence. bioRxiv (2020). doi:https://doi.org/10.1101/2020.04.06.027854
13. Cyranoski, D. Profile of a killer: the complex biology powering the coronavirus pandemic. Nature 581, 22–26 (2020).
14. Thanh Le, T. et al. The COVID-19 vaccine development landscape. Nat. Rev. Drug Discov. 19, 305–306 (2020).
15. Wrapp, D. et al. Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation. Science. 367, 1260–1263 (2020).
16. Vijgen, L. et al. Complete Genomic Sequence of Human Coronavirus OC43: Molecular Clock Analysis Suggests a Relatively Recent Zoonotic Coronavirus Transmission Event. J. Virol. 79, 1595–1604 (2005).