The recent coronavirus outbreak in Wuhan, China, has sparked widespread suspicion about the potential laboratory-related origin of the virus, or even being part of the Chinese biowarfare program, with Senator Tom Cotton as one the most prominent figures publicly alluding to this point. We hereby present a systematic analysis of available information, based on the viral genomic sequence as deposited by Chinese investigators to public database, to unambiguously rule out the possibility of a laboratory-related origin.
As noted by many believers of the conspiracy theory, scientists at the Wuhan Institute of Virology once participated in a study creating a chimera virus to test the human pathogenicity of novel coronaviruses detected in the bat population (Menachery et al 2015 Nature Medicine, note that this study was not performed in Wuhan but at University of North Carolina at Chapel Hill). When creating such chimera viruses artificially, scientists need to use recombinant DNA technology to splice sequences derived from different origins together. In the case of the Nature Medicine paper, scientists put the S gene of a bat coronavirus into the backbone of a mouse-adapted coronavirus, which is analogous to installing an Audi engine to a BMW shell. Such chimera virus sequences are extremely easy to recognize as all one needs to do is to use a bioinformatics tool known as BLAST to scan the sequence and one would immediately know whether the sequence is assembled from pieces of known origin. The sequence of the 2019 NcoV has been analyzed by BLAST by scientists throughout the world and the conclusion is simple: no sign of a chimera being produced via artificial splicing can be detected at all. Instead, the 2019 NcoV sequence displayed a remarkable level of global similarity (96.2% identical) to a strain of coronavirus identified in wild bat, coded RaTG13 and captured by Dr. Zheng-li Shi’s group in 2013, which almost definitively indicates that 2019 NcoV has evolved from RaTG13 but not assembled from separate pieces.
We now have ruled out the possibility that 2019 NcoV is a man-made chimera virus. What about the possibility that 2019 NcoV is modified from RaTG13 by human? Whereas manually introduced mutations are localized and well-defined, the differences between NcoV and RaTG13 sequences are randomly scattered throughout the genome and the nature of the mutations is highly variable, which is exactly what one would expect for mutations accumulated over the course of natural evolution. Although it is theoretically possible to artificially re-create all those natural-looking mutations, it would entail an unreasonably large workload. More importantly, the impact of most of the mutations in 2019 NcoV relative to RaTG13 on viral infectivity or pathogenicity are difficult to predict and therefore it is against experimental logic to intentionally make them. Hence the only sensible explanation is 2019 NcoV is a result of natural evolution from a common ancestor shared by RaTG13.
Even if one admits that 2019 NcoV is of pure natural origin, some people may still argue that it could have transmitted to local animal habitants of Wuhan from bats kept at Wuhan Institute of Virology, perhaps out of negligence in management. To critically examine this possibility, we need to estimate how long it would take for a bat virus to evolve into the 2019 NcoV. As we discussed, the closest relative of 2019 NcoV at Wuhan Institute of Virology is the RaTG13 bat captured in 2013. Suppose the leakage happened immediately after the bat was transferred to Wuhan in 2013, there would be 6 years for the RaTG13 virus to evolve into the current form of 2019 NcoV. The 3.8% difference between RaTG13 and NcoV translates into approximately 1100 nucleotides. Estimation of the speed of natural mutation of coronaviruses varies but 90 nucleotide per year seems to be an upper limit. In this case, it would take at least 12 years for RaTG13 virus to evolve into 2019 NcoV, which certainly does not fit into the present scenario.
What is the likely origin of 2019 NcoV? The fact that it bears striking resemblance to RaTG13 strongly argues that bat was the original host of the virus. However, the host that directly transmitted the virus to human still remains a mystery, because a direct host must carry a virus that is more than 99% identical to 2019 NcoV, and such a species has not been identified. On Feb 7, scientists at South China Agricultural University announced detection of fragments of coronavirus retrieved from pangolin that are about 99% identical with 2019 NcoV. The detailed results have not been published and the quality of the data awaits further scrutiny, but it certainly hints at the possibility that pangolin may be on the chain of potential intermediate hosts linking bat to human.