III. RaTG13 and SARS-CoV-2—Four Inserts in SARS-CoV-2 in Close Proximity on Genome

SARS-CoV-2 has just under 30,000 nucleotides in its genome. It has 4 insertion points that are in close proximity in the sequence from lines 21975 to 23653. The 4 insertion points are italicized in red. The Query is SARS-CoV-2 Wuhan-Hu-1, and the Subject is coronavirus RaTG13, the closest relative to SARS-CoV-2 (96.3%):

INSERTION POINT 1

Query  21736  CTTTTCCAATGTTACTTGGTTCCATGCTATACATGTCTCTGGGACCAATGGTACTAAGAG  21795
||| |||||||| || |||||||||||||||||||| || ||||||||||||| ||| ||
Sbjct 21718 CTTCTCCAATGTGACCTGGTTCCATGCTATACATGTTTCAGGGACCAATGGTATTAAAAG 21777

Query 21796 GTTTGATAACCCTGTCCTACCATTTAATGATGGTGTTTATTTTGCTTCCACTGAGAAGTC 21855
|||||||||||| || || ||||| || ||||| || |||||||||||||||||||||||
Sbjct 21778 GTTTGATAACCCAGTTCTGCCATTCAACGATGGCGTCTATTTTGCTTCCACTGAGAAGTC 21837

INSERTION POINT 2

Query 21976 TCCATTTTTGGGTGTTTATTACCACAAAAACAACAAAAGTTGGATGGAAAGTGAGTTCAG 22035
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 21958 TCCATTTTTGGGTGTTTATTACCACAAAAACAACAAAAGTTGGATGGAAAGTGAGTTCAG 22017

INSERTION POINT 3

Query 22276 TCAAACTTTACTTGCTTTACATAGAAGTTATTTGACTCCTGGTGATTCTTCTTCAGGTTG 22335
||||||||||||||||||||||||||| ||||||||||||||||||||||||||||||||
Sbjct 22258 TCAAACTTTACTTGCTTTACATAGAAGCTATTTGACTCCTGGTGATTCTTCTTCAGGTTG 22317

INSERTION POINT 4

Query 23534 AACTCATATGAGTGTGACATACCCATTGGTGCAGGTATATGCGCTAGTTATCAGACTCAG 23593 
||||| ||||||||||||||||| ||||||||||| |||||||| ||||||||||||||
Sbjct 23516 AACTCGTATGAGTGTGACATACCTATTGGTGCAGGAATATGCGCCAGTTATCAGACTCAA 23575

Query 23594 ACTAATTCTCCTCGGCGGGCACGTAGTGTAGCTAGTCAATCCATCATTGCCTACACTATG 23653
||||||| |||||||||| || |||||||| || |||||||||||||||
Sbjct 23576 ACTAATT------------CACGTAGTGTGGCCAGTCAATCTATTATTGCCTACACTATG 23623

There’s a 3 nucleotide difference in insert 1. Inserts 2 and 3 are identical. The difference between SARS-CoV-2 and RaTG13 in insert 4 is notable. Here is the sequence for coronavirus RaTG13 without the dashes:

23576 actaa ttcacgtagt gtggccagtc aatctattat tgcctacact atg

Insert 4 in SARS-CoV-2 occurs between t and c of RaTG13 (23582 and 23583).

In the following figure, the two images represent the genomes of SARS-CoV-2 Wuhan-Hu-1 and coronavirus RaTG13. Note the genomic region on SARS-CoV-2 where the 4 inserts are in close proximity:

All four inserts are found only in SARS-CoV-2

The 4 insertion points in the spike glycoprotein S are unique to SARS-CoV-2. Inserts 2 and 3 are in both SARS-CoV-2 and RaTG13.

RaTG13 and SARS-CoV-2 are single-stranded RNA viruses that have four bases:

The nitrogenous bases in RNA are adenineguaninecytosine, and uracil, which replaces thymine in DNA (Encyclopedia Britannica).

A trapezoid (or trapezium) has four sides with two sides parallel which are called bases:

The parallel sides are called the bases of the trapezoid and the other two sides are called the legs or the lateral sides (if they are not parallel; otherwise there are two pairs of bases) (Wikipedia).

A base is attached to the 1′ position, in general, adenine (A), cytosine (C), guanine (G), or uracil (U) (Wikipedia).

SARS-CoV-2 was bioengineered from coronavirus RaTG13 by adding insertion points 1 and 4:

RaTG13 + Trapezium bases → SARS-CoV-2 = Covid 19.

NIH database

Whenever an NCBI Blast is ran of SARS-CoV-2 Wuhan-Hu-1 against older coronaviruses at the NIH database, the sequences from approximately 21577 to 22539 are missing in the older coronaviruses. Ironically, this is where inserts 1, 2, and 3 of SARS-CoV-2 are located (insert 4 is on line 23,594). The genomes of the older coronaviruses are broken into ranges. The point where range 1 ends and range 2 begins is where the sequences are missing. The following three BLAST results show where the missing sequences should be (in bold). I searched the database and found the missing sequences of the Subjects (the 3 older coronaviruses) to prove that the 4 inserts are not present and are found only in SARS-CoV-2. As usual, SARS-CoV-2 Wuhan-Hu-1 is the Query:

SUBJECT 1Bat coronavirus isolate Anlong-103. Missing Sequence 21501 to 22879 (1378 nucleotides):

Query  21565  GTTTGTTTTTCTTGTTTT  21582
||| |||| |||||||
Sbjct 21483 GTTAATTTTGTTTGTTTT 21500

Range 2: 22880 to 27722GenBankGraphicsNext MatchPrevious MatchFirst Match

Query 23070 TTGGTTACCAACCATA-CAGAGTAGTAGTACTTTCTTTTGAACTTCTACATGCACCAGCA 23128
||| ||| ||| | || |||||| ||||| ||||| |||||| | || ||||||| ||
Sbjct 22880 TTGATTATCAAGC-TACCAGAGTGGTAGTGCTTTCATTTGAATTGCTGAATGCACCTGCT 22938

Here is the missing sequence (with inserts 1-3 in red) for Bat coronavirus isolate Anlong-103:

    21501 tcttcctttt attactgcag atacgtgtct caattttact
21541 aatcttgcag cgcctgctta taacatagcc tcctcgtctc gacgtggtgt gtattatcct
21601 gatgacattt ttcggtctga ctttttacat ttggtaaatg attattttct gccatttggt
21661 tccaatgtaa ctcaatttta tactcagggt actaatattg ataaccccac tttgccattt
21721 agagatggtg tgtattttgc tgccacagag aagtctaata tagttagagg ctggattttt
21781 ggttctactt tggactccac ctcccagtct gctataattt taaataattc tacaaatttg
21841 attgtgcggg tttgtaattt tgaattatgt aaagtgccac tatttgtggt ttttaaatct
21901 aataattccc agttatcaca cttgtttagt gatagtttta attgtacctt tgagtatgtt
21961 tctagggctt tctctcttga tatacgcgag cagtcaggta attttgtgga tttaagagag
22021 tttgtttttc gtaataggaa tggcttcctt catatttatg agggttatga ggctatttct
22081 atagttagag gattgcctgc cgggttcaac gtcctcaagc cattattaaa gataccattt
22141 ggccttaatg ttacgtcttt taagactttt cttacagttt atagggtggc agcaggtagt
22201 attagtgtag cgagctctgc ttattttgta ggttatttaa aaccattaac tttcatgctt
22261 agttatgatt taaatggtac tattaataat gctgttgatt gttctcagga tccgctcgct
22321 gagttaaagt gtactattaa gaattttaat gtttctaaag gcatttatca gacttcaaac
22381 ttcagagtgt ctccaactcg ggaggttgtt agatttccta atattacaaa tcgctgtcct
22441 tttgacagca tctttaatgc ttccagattt ccttctgtgt atgcgtggga aaggactaaa
22501 atttctgatt gtgttgcgga ttatactgtt ctctacaact caacctcatt ttcaactttt
22561 aagtgttatg gagtttctcc ctctaagttg attgatttat gctttacaag tgtgtatgct
22621 gatacattct tgataagatt ttctgaagtc aggcaagttg caccgggtga aactggtgtt
22681 attgctgact ataattatag gctacctgat gacttcacag gctgtgtcat agcttggaat
22741 acagctaatc aagatgttgg tagttatttt tatagatctc atcgctccac caaattaaag
22801 ccatttgagc gtgatctttc ttctgacgag aatggtgttc gtacacttag tacatatgac
22861 ttcaaccctt atgtacctc

By comparing the missing sequence of the Subject with insertion points 1-3, we see that insert 1 is not present in line 21721, insert 2 is not present in line 21961, and insert 3 is not present in line 22261. Let’s look at where insert 4 is located on SARS-CoV-2:

Query  23546  TGTGACATACCCATTGGTGCAGGTATATGCGCTAGTTATCAGACTCAGACTAATTCTCCT  23605
|||||||| || |||||||| || || || ||||| || || || || | |||| ||
Sbjct 23356 TGTGACATTCCTATTGGTGCTGGCATTTGTGCTAGCTACCATAC--AG-C---TTCTACT 23409

Query 23606 CGGCGGGCACGTAGTGTAGCT-AGTCAATCCATCATTGCCTACACTATGTCACTT-GGTG 23663
| ||||||||||| | || ||||||| | || ||||||||||| ||| ||||
Sbjct 23410 ---C---TACGTAGTGTAGGTCAG-AAATCCATTGTGGCTTACACTATGTC-CTTGGGTG 23461

Here is the sequence for Bat coronavirus isolate Anlong-103 without the dashes:

23356: tgtga cattcctatt ggtgctggca tttgtgctag ctaccataca gcttctactc ta.

Insert 4 is only present in SARS-CoV-2 viruses.

SUBJECT 2—SARS coronavirus Frankfurt 1. Missing Sequence 21465 to 22387 (922 nucleotides):

Query  21562  AATGTTTGTTTTTCTT  21577
| ||||| |||| |||
Sbjct 21451 A-TGTTTATTTT-CTT 21464

Range 2: 22388 to 27513GenBankGraphicsNext MatchPrevious MatchFirst Match

Query 22539 TTGTTAGATTTCCTAATATTACAAACTTGTGCCCTTTTGGTGAAGTTTTTAACGCCACCA 22598
|||| ||||| |||||||||||||||||||| |||||||| || |||||||| || || |
Sbjct 22388 TTGTGAGATTCCCTAATATTACAAACTTGTGTCCTTTTGGAGAGGTTTTTAATGCTACTA 22447

Here is the missing sequence (with inserts 1-3 in red) for SARS coronavirus Frankfurt 1:

    21465 attatt tcttactctc 
21481 actagtggta gtgaccttga ccggtgcacc acttttgatg atgttcaagc tcctaattac
21541 actcaacata cttcatctat gaggggggtt tactatcctg atgaaatttt tagatcagac
21601 actctttatt taactcagga tttatttctt ccattttatt ctaatgttac agggtttcat
21661 actattaatc atacgtttgg caaccctgtc atacctttta aggatggtat ttattttgct
21721 gccacagaga aatcaaatgt tgtccgtggt tgggtttttg gttctaccat gaacaacaag
21781 tcacagtcgg tgattattat taacaattct actaatgttg ttatacgagc atgtaacttt
21841 gaattgtgtg acaacccttt ctttgctgtt tctaaaccca tgggtacaca gacacatact
21901 atgatattcg ataatgcatt taattgcact ttcgagtaca tatctgatgc cttttcgctt
21961 gatgtttcag aaaagtcagg taattttaaa cacttacgag agtttgtgtt taaaaataaa
22021 gatgggtttc tctatgttta taagggctat caacctatag atgtagttcg tgatctacct
22081 tctggtttta acactttgaa acctattttt aagttgcctc ttggtattaa cattacaaat
22141 tttagagcca ttcttacagc cttttcacct gctcaagaca tttggggcac gtcagctgca
22201 gcctattttg ttggctattt aaagccaact acatttatgc tcaagtatga tgaaaatggt
22261 acaatcacag atgctgttga ttgttctcaa aatccacttg ctgaactcaa atgctctgtt
22321 aagagctttg agattgacaa aggaatttac cagacctcta atttcagggt tgttccctca
22381 ggagatg

By comparing the missing sequence of the Subject with insertion points 1-3, we see that insert 1 is not present in line 21721, insert 2 is not present in line 21961, and insert 3 is not present in line 22261. Let’s look at where insert 4 is located on SARS-CoV-2:

Query  23552  ATACCCATTGGTGCAGGTATATGCGCTAGTTATCAGACTCAGACTAATTCTCCTCGGCGG  23611
|| || ||||| || || || || |||||||| || || || | |||| |
Sbjct 23398 ATTCCTATTGGAGCTGGCATTTGTGCTAGTTACCATAC--AG--T--TTCT--T----TA 23445

Query 23612 GCACGTAGT-GTAGCTAGTCAATCCATCATTGCCTACACTATGTCACTTGGTGCAGAAAA 23670
||||||| |||| | |||| || | || || |||||||| | ||||| || |
Sbjct 23446 TTACGTAGTACTAGCCA-AAAATCTATTGTGGCTTATACTATGTCTTTAGGTGCTGATAG 23504

Here is the sequence for SARS coronavirus Frankfurt 1 without the dashes:

23398: att cctattggag ctggcatttg tgctagttac catacagttt ctttattac.

Insert 4 is only present in SARS-CoV-2 viruses.

SUBJECT 3—SARS coronavirus Urbani. Missing sequence 21506 to 22428 (922 nucleotides):

Query  21561  CAATGTTTGTTTTTCTT  21577
|| ||||| |||| |||
Sbjct 21491 CA-TGTTTATTTT-CTT 21505

Range 2: 22429 to 27798GenBankGraphicsNext MatchPrevious MatchFirst Match

Query 22539 TTGTTAGATTTCCTAATATTACAAACTTGTGCCCTTTTGGTGAAGTTTTTAACGCCACCA 22598
|||| ||||| |||||||||||||||||||| |||||||| || |||||||| || || |
Sbjct 22429 TTGTGAGATTCCCTAATATTACAAACTTGTGTCCTTTTGGAGAGGTTTTTAATGCTACTA 22488

Here is the missing sequence (with inserts 1-3 in red) for SARS coronavirus Urbani:

    21506 attat ttcttactct cactagtggt agtgaccttg
21541 accggtgcac cacttttgat gatgttcaag ctcctaatta cactcaacat acttcatcta
21601 tgaggggggt ttactatcct gatgaaattt ttagatcaga cactctttat ttaactcagg
21661 atttatttct tccattttat tctaatgtta cagggtttca tactattaat catacgtttg
21721 gcaaccctgt catacctttt aaggatggta tttattttgc tgccacagag aaatcaaatg
21781 ttgtccgtgg ttgggttttt ggttctacca tgaacaacaa gtcacagtcg gtgattatta
21841 ttaacaattc tactaatgtt gttatacgag catgtaactt tgaattgtgt gacaaccctt
21901 tctttgctgt ttctaaaccc atgggtacac agacacatac tatgatattc gataatgcat
21961 ttaattgcac tttcgagtac atatctgatg ccttttcgct tgatgtttca gaaaagtcag
22021 gtaattttaa acacttacga gagtttgtgt ttaaaaataa agatgggttt ctctatgttt
22081 ataagggcta tcaacctata gatgtagttc gtgatctacc ttctggtttt aacactttga
22141 aacctatttt taagttgcct cttggtatta acattacaaa ttttagagcc attcttacag
22201 ccttttcacc tgctcaagac atttggggca cgtcagctgc agcctatttt gttggctatt
22261 taaagccaac tacatttatg ctcaagtatg atgaaaatgg tacaatcaca gatgctgttg
22321 attgttctca aaatccactt gctgaactca aatgctctgt taagagcttt gagattgaca
22381 aaggaattta ccagacctct aatttcaggg ttgttccctc aggagatgt

By comparing the missing sequence of the Subject with insertion points 1-3, we see that insert 1 is not present in line 21721, insert 2 is not present in line 21961, and insert 3 is not present in line 22261. Let’s look at where insert 4 is located on SARS-CoV-2:

Query  23553  TACCCATTGGTGCAGGTATATGCGCTAGTTATCAGACTCAGACTAATTCTCCTCGGCGGG  23612
| || ||||| || || || || |||||||| || || || | |||| |
Sbjct 23440 TTCCTATTGGAGCTGGCATTTGTGCTAGTTACCATAC--AG--T--TTCT--T----TAT 23487

Query 23613 CACGTAGT-GTAGCTAGTCAATCCATCATTGCCTACACTATGTCACTTGGTGCAGAAAAT 23671
||||||| |||| | |||| || | || || |||||||| | ||||| || | |
Sbjct 23488 TACGTAGTACTAGCCA-AAAATCTATTGTGGCTTATACTATGTCTTTAGGTGCTGATAGT 23546

Here is the sequence for SARS coronavirus Urbani without the dashes:

23440: t tcctattgga gctggcattt gtgctagtta ccatacagtt tctttatta.

These results will be the same for any coronaviruses prior to 2019-nCoV. Insert 4 is unique to SARS-CoV-2 viruses.

Leave a comment