AlphaFold2 examples from CASP 14: Difference between revisions
Eric Martz (talk | contribs) No edit summary |
Eric Martz (talk | contribs) No edit summary |
||
(106 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
< | Prediction of protein structures from amino acid sequences, [[theoretical modeling]], has been extremely challenging. In 2020, breakthrough success was achieved by AlphaFold2<ref name="af2">PMID:31942072</ref>, a project of [http://deepmind.com DeepMind]. '''For an overview of this breakthrough''', documented by the bi-annual prediction competition [[Theoretical_models#CASP|CASP]], please see [[Theoretical_models#2020:_CASP_14|2020: CASP 14]]. Below are illustrated two examples of predictions from that competition. | ||
<StructureSection load='' size='350' side='right' caption='' scene='87/875686/Af2_vs_7jx6_chain_a/1'> | |||
==Free Modeling Results== | |||
[https://predictioncenter.org/casp14/domains_summary.cgi More than 100 domains] were provided as prediction targets in CASP 14. 14 of these were in the most difficult category, ''free modeling'' ("FM"), meaning that no informative [[homology modeling]] templates existed. For 8 of these (57%), AlphaFold2's predictions achieved [[Theoretical_modeling#CASP_14_Global_Distance_Test_Results|GDT_TS scores]] of 87-93 (median 88.5). For those 8, GDT_TS of the second best predictions were 43-76 (median 66). Two cases will be analyzed below. | |||
First, SARS-CoV-2 ORF8<ref name="7jtl" />, a 92-residue FM domain where '''AlphaFold2's GDT_TS was 87, and the second best was 43''' (by the group of Xian Ming Pan)<ref name="t1064">For SARS-CoV-2 ORF8, at the [https://predictioncenter.org/casp14/results.cgi?view=tb-sel CASP 14 Table Browser], check T1064-D1 and press ''Show Results''.</ref>, the largest difference between 1st and 2nd predictions among the FM targets. It is further unusual because two independently-determined X-ray crystallographic structures were subsequently published. Inspiration for this case came from the discussion by Rubiera<ref name="rubiera">[https://www.blopig.com/blog/2020/12/casp14-what-google-deepminds-alphafold-2-really-achieved-and-what-it-means-for-protein-folding-biology-and-bioinformatics/ CASP14: what Google DeepMind’s AlphaFold 2 really achieved, and what it means for protein folding, biology and bioinformatics], a blog post by Carlos Outeir al Rubiera, December 3, 2020.</ref>. | |||
Second, the '''longest domain in the FM category, 404 residues'''. This domain is part of the 2,180-residue RNA polymerase of a bacteriophage, some of whose group members are prevalent in the human gut<ref name="6vr4">PMID: 33208949</ref>. Eight of the CASP 14 FM target domains are parts of this protein, [[6vr4]]. For the 404-residue domain T1037, AlphaFold2 achieved GDT_TS of 88, and the second best prediction, 63 (by Seok-refine)<ref name="t1037">For the phage RNA polymerase target, at the [https://predictioncenter.org/casp14/results.cgi?view=tb-sel CASP 14 Table Browser], check T1037-D1 and press ''Show Results''.</ref>. Among the 14 FM targets, the second-longest has 276 residues, the median 132, and the shortest, 92. | |||
==SARS-CoV-2 ORF8== | ==SARS-CoV-2 ORF8== | ||
Our first example is [[SARS-CoV-2 protein ORF8]], a protein that contributes to virulence in COVID-19<ref name="7jtl">PMID: 33361333</ref>. CASP 14 classified ORF8 as a "free modeling" (FM) target<ref name="casp14domains">[https://predictioncenter.org/casp14/domains_summary.cgi Summary, Definitions and Classifications of Domains for CASP 14].</ref>, meaning that there were no adequate empirical templates for [[homology modeling]]. This was easily confirmed. When the [https://www.uniprot.org/uniprot/P0DTC8 amino acid sequence of ORF8] is submitted to [https://swissmodel.expasy.org/ Swiss Model], it reports the best templates for homology modeling. When the two [[empirical models]] that were not available during CASP 14 are excluded ([[7jtl]] and [[7jx6]]), the best template offered, chain B of [[3afc]], covers only 36% of the length of ORF8 at 13.2% sequence identity, with a 4-residue untemplated gap in the sequence alignment. This template would not be adequate for constructing a useful model. | |||
===X-Ray Structures for ORF8=== | ===X-Ray Structures for ORF8=== | ||
Line 13: | Line 16: | ||
The quality of predictions for the structure of ORF8 are judged by comparison with X-ray crystallographic [[empirical models]] which were not available to the groups making predictions. Shortly after the CASP 14 competition (summer 2020), two X-ray crystal structures were reported for ORF8: [[7jtl]] released August 26, 2020, and [[7jx6]], released September 23, 2020. The [[resolution|resolutions]] are 2.0 and 1.6 Å respectively, and both have worse than average [[Rfree]] values. | The quality of predictions for the structure of ORF8 are judged by comparison with X-ray crystallographic [[empirical models]] which were not available to the groups making predictions. Shortly after the CASP 14 competition (summer 2020), two X-ray crystal structures were reported for ORF8: [[7jtl]] released August 26, 2020, and [[7jx6]], released September 23, 2020. The [[resolution|resolutions]] are 2.0 and 1.6 Å respectively, and both have worse than average [[Rfree]] values. | ||
{{Template:Green links zoom}} | {{Template:Green links zoom}} | ||
<scene name='87/875686/Chain_a_of_7jx6/1'>Here is one chain of ORF8</scene> from the higher resolution X-ray structure, [[7jx6]]. These chains form [http://firstglance.jmol.org/fg.htm?mol=7jx6 disulfide-linked dimers], and the dimers form higher order multimers<ref name=" | <scene name='87/875686/Chain_a_of_7jx6/1'>Here is one chain of ORF8</scene> from the higher resolution X-ray structure, [[7jx6]]. These chains form [http://firstglance.jmol.org/fg.htm?mol=7jx6 disulfide-linked dimers], and the dimers form higher order multimers<ref name="7jtl" /> (not shown). Notice that the <span class="text-blue"><b>amino</b></span> and <span class="text-red"><b>carboxy</b></span> '''ends of the chain come together''' to form two parallel beta strands of a beta sheet. Also notice that there are '''3 disulfide bonds'''. An accurate prediction would include both of these features. | ||
<scene name='87/875686/Morf_lin_7jx6_imf_7jtl/3'>The two X-ray structures agree very well</scene><ref name="imf"> | <scene name='87/875686/Morf_lin_7jx6_imf_7jtl/3'>The two X-ray structures agree very well</scene><ref name="imf">Superposition by Swiss-PdbViewer's ''iterative magic fit''. This starts with a sequence alignment-guided structural superposition, and then superposes subsets of the structures to minimize the RMSD. Eight intermediate structures were generated by the [[Morphs#Linear_Morph_Server|Theis Morph Server]] by linear interpolation.</ref>. The only substantial disagreement is for a large surface loop, sequence range 48-57. See the Table I below for [https://en.wikipedia.org/wiki/Root-mean-square_deviation_of_atomic_positions RMSD] values. | ||
===ORF8 is not a novel fold=== | ===ORF8 is not a novel fold=== | ||
Less than 2% of new [[empirically-determined structures]] have novel folds; that is, folds not aready represented in the [[PDB]]<ref name="cath2011">PMID: 21097779</ref>. When chain A of [[7jx6]] was submitted to Dali<ref name="dali2020">PMID: 31606894</ref> (February, 2021), the top hit was the N-terminal domain of the two domains in [[5a2f]], the CD166 human cell surface receptor involved in activation of T lymphocytes. The Z-score was 7.1, and 88 alpha carbons | Less than 2% of new [[empirically-determined structures]] have novel folds; that is, folds not aready represented in the [[PDB]]<ref name="cath2011">PMID: 21097779</ref>. When chain A of [[7jx6]] was submitted to Dali<ref name="dali2020">PMID: 31606894</ref> (February, 2021), the top hit was the N-terminal domain of the two domains in [[5a2f]], the CD166 human cell surface receptor involved in activation of T lymphocytes. The Z-score was 7.1, and 88 alpha carbons superposed with RMSD 3.2 Å. Swiss-PdbViewer obtained RMSD 1.95 Å for 48 alpha carbons<ref name="fitselimprov">Using Swiss-PdbViewer's ''Fit from Selection'' with 102 residues selected from each structure, followed by ''Improve Fit''.</ref>. Dali reported the identity as 6% in its structure-based sequence alignment. Sequence alignment by MAFFT<ref name="mafft">PMID: 23329690</ref> obtained 18% sequence identity using more and larger gaps. <scene name='87/875686/Dali_5a2f_vs_7jx6_yale/2'>The structural similarity between Dali's top hit and 7jx6</scene><ref name="yale">Structural superposition by Dali. Interpolation by the [http://www2.molmovdb.org/wiki/info/index.php/Morph2_Server Yale Morph2 Server]. Homogenization method: homology modeling. No minimization. This produced a 9-model file where model 1 was 7jx6, and models 2-9 were interpolations. 5a2f residues 28-133 were added as model 10 (black in the molecular scene).</ref> is not as close as for AlphaFold2's prediction, but is closer than the 2nd best prediction (see Table I below). In conclusion, '''ORF8 does not have a novel fold'''<ref name="holm">The interpretation of Dali's result to mean that ORF8 does not have a novel fold was kindly confirmed by Liisa Holm, personal communication to [[User:Eric Martz|Eric Martz]], February, 2021.</ref>. | ||
===AlphaFold2 Prediction for ORF8=== | ===AlphaFold2 Prediction for ORF8=== | ||
The quality of a prediction in CASP is judged, in large part, by the [[Theoretical_models#CASP_14_Global_Distance_Test_Results|Global Distance Test Total Score, GDT_TS]]. AlphaFold2's predicted structure<ref>Download AlphaFold2's predicted structure for ORF8 from [https://predictioncenter.org/casp14/MODELS_PDB/T1064-D1/T1064TS427_1-D1.pdb T1064TS427_1-D1.pdb].</ref> has a '''GDT_TS score of 87'''. (A score of 0 is meaningless, and a score of 100 means perfect agreement with an X-ray crystal structure.) 87 means <scene name='87/875686/Af2_vs_7jx6_chain_a/1'>the model is close to the accuracy of an X-ray crystal structure</scene><ref name="imf" />. The structure predicted by AlphaFold2 is '''almost as close to the X-ray crystallographic model''' [[7jx6]] as is the independently-determined X-ray structure [[7jtl]]. AlphaFold2 predicted the positions of 92 amino acids. (CASP 14 excluded residues 48-59, a 12-residue surface loop, from the target residues<ref name="casp14domains" />.) See Table I below for [https://en.wikipedia.org/wiki/Root-mean-square_deviation_of_atomic_positions RMSD] values. The | The quality of a prediction in CASP is judged, in large part, by the [[Theoretical_models#CASP_14_Global_Distance_Test_Results|Global Distance Test Total Score, GDT_TS]]. AlphaFold2's predicted structure<ref>Download AlphaFold2's predicted structure for ORF8 from [https://predictioncenter.org/casp14/MODELS_PDB/T1064-D1/T1064TS427_1-D1.pdb T1064TS427_1-D1.pdb].</ref> has a '''GDT_TS score of 87'''. (A score of 0 is meaningless, and a score of 100 means perfect agreement with an X-ray crystal structure.) 87 means <scene name='87/875686/Af2_vs_7jx6_chain_a/1'>the model is close to the accuracy of an X-ray crystal structure</scene><ref name="imf" />. The structure predicted by AlphaFold2 is '''almost as close to the X-ray crystallographic model''' [[7jx6]] as is the independently-determined X-ray structure [[7jtl]]. AlphaFold2 predicted the positions of 92 amino acids. (CASP 14 excluded residues 48-59, a 12-residue surface loop, from the target residues<ref name="casp14domains" />.) See Table I below for [https://en.wikipedia.org/wiki/Root-mean-square_deviation_of_atomic_positions RMSD] values. The prediction was largely accurate regarding salt bridges and cation-pi interactions (see Tables III and IV below). | ||
{| style="text-align:center;" class="wikitable" | {| style="text-align:center;" class="wikitable" | ||
|+ Table I. ORF8 Predictions | |+ Table I. ORF8 Predictions Superposed With Chain A of [[7jx6]] | ||
|- | |- | ||
! Model || GDT_TS || | ! Model || GDT_TS || Disulfide<br>Bonds || Cα [https://en.wikipedia.org/wiki/Root-mean-square_deviation_of_atomic_positions RMSD], Å || Cα Superposed || [https://en.wikipedia.org/wiki/Root-mean-square_deviation_of_atomic_positions RMSD] Including<br>Sidechains, Å || Atoms Superposed | ||
|- | |- | ||
| [[7jtl]]:A || || 3 || 4.02<br>'''0.66''' || 102/102 (100%)<br>'''87/102 (85%)''' || 4.3<br>'''1.58''' || 829/829 (100%)<br>'''709/829 (86%)''' | | [[7jtl]]:A || 96<ref name="gdt_ts">See [[#GDT_TS Calculations]].</ref> || 3 || 4.02<br>'''0.66''' || 102/102 (100%)<br>'''87/102 (85%)''' || 4.3<br>'''1.58''' || 829/829 (100%)<br>'''709/829 (86%)''' | ||
|- | |- | ||
| AlphaFold2 || 87 || 3 || 2.58<br>'''1.25''' || 92/92 (100%)<br>'''83/92* (90%)''' || 3.23<br>'''1.91''' || 747/748 (100%)<br>'''679/748 (91%)''' | | AlphaFold2 || 87 || 3 || 2.58<br>'''1.25''' || 92/92 (100%)<br>'''83/92* (90%)''' || 3.23<br>'''1.91''' || 747/748 (100%)<br>'''679/748 (91%)''' | ||
|- | |- | ||
| 2nd Best* || 43 || 0 || 5.33<br>'''1.71''' || 92/92 (100%)<br>'''38/92 (41%)''' || 6.54<br>'''5.86''' || 747/748 (100%)<br>'''324/748 (43%)''' | | Dali top hit<ref name="nnf">See [[#ORF8 is not a novel fold]].</ref> [[5a2f]] || 60<ref name="gdt_ts" /> || na || 3.2<br>'''1.95''' || 92/92 (100%)<br>'''48/92 (52%)''' || na || na | ||
|- | |||
| 2nd Best* || 43 || 0 || 5.33<br>'''<span class="text-gray">1.71</span>''' || 92/92 (100%)<br>'''<span class="text-gray">38/92 (41%)</span>''' || 6.54<br>'''<span class="text-gray">5.86</span>''' || 747/748 (100%)<br>'''<span class="text-gray">324/748 (43%)</span>)''' | |||
|- | |- | ||
| 3rd Best§ || 33 || 0 || 13.37<br>† || 92/92 (100%)<br>† || 14.50<br>† || 747/748 (100%)<br>† | | 3rd Best§ || 33 || 0 || 13.37<br>† || 92/92 (100%)<br>† || 14.50<br>† || 747/748 (100%)<br>† | ||
Line 41: | Line 46: | ||
| Rosetta<br>Server || 26 || (2‡) || 14.99<br>† || 92/92 (100%)<br>† || 16.07<br>† || 747/748 (100%)<br>† | | Rosetta<br>Server || 26 || (2‡) || 14.99<br>† || 92/92 (100%)<br>† || 16.07<br>† || 747/748 (100%)<br>† | ||
|} | |} | ||
: | :Superpositions by "Magic Fit"<ref name="mf">Superposition by Swiss-PdbViewer's ''magic fit''. This is a sequence alignment-guided structural superposition. Eight intermediate structures were generated by the [[Morphs#Linear_Morph_Server|Theis Morph Server]] by linear interpolation.</ref> of Swiss-PdbViewer 4.1.<br> | ||
:''' | :'''Superpositions by "Iterative Magic Fit"<ref name="imf" /> of Swiss-PdbViewer 4.1.'''<br> | ||
:<span class="text-gray">Superpositions involving less than 50% of each structure.</span><br> | |||
:na: Not Applicable. | |||
:*Second best: Group of Xian Ming Pan, Tsinghua University, Beijing.<br> | :*Second best: Group of Xian Ming Pan, Tsinghua University, Beijing.<br> | ||
:§Third best: Group of Alberto Perez, University of Florida, Gainsville.<br> | :§Third best: Group of Alberto Perez, University of Florida, Gainsville.<br> | ||
:† Iterative Magic Fit was unable to | :† Iterative Magic Fit was unable to superpose.<br> | ||
:‡ Neither disulfide bond is correct. | :‡ Neither disulfide bond is correct. | ||
===Second Best Prediction for ORF8=== | ===Second Best Prediction for ORF8=== | ||
In CASP 14, 70 research groups and 42 automated servers predicted structures for ORF8. The median GDT_TS score for all 112 predictions was 26. AlphaFold2 made the best prediction (GDT_TS 87). <scene name='87/875686/Second_best_orf8_imf/1'>The second best prediction was by the group of Xian Ming Pan</scene>, with GDT_TS 43 (see Table I above). The fold and topology were predicted correctly, but the '''details are far less accurate''' than those in AlphaFold2's prediction. The 2nd best prediction has '''no disulfide bonds'''. | In CASP 14, 70 research groups and 42 automated servers predicted structures for ORF8. The median GDT_TS score for all 112 predictions was 26. AlphaFold2 made the best prediction (GDT_TS 87). <scene name='87/875686/Second_best_orf8_imf/1'>The second best prediction was by the group of Xian Ming Pan</scene><ref name="imf" />, with GDT_TS 43 (see Table I above). The fold and topology were predicted correctly, but the '''details are far less accurate''' than those in AlphaFold2's prediction. The 2nd best prediction has '''no disulfide bonds'''. This prediction was largely incorrect regarding salt bridges and cation-pi interactions (see Tables III and IV below). | ||
===Third Best Prediction for ORF8=== | ===Third Best Prediction for ORF8=== | ||
The third best prediction for ORF8 was by the Perez Lab, with GDT_TS 33 (see Table I above). It '''correctly predicted the parallel beta strands formed by the amino and carboxy terminal ends of the chain'''. <scene name='87/875686/3rd_best_orf8/1'>When the 2-stranded parallel beta strands formed by the ends of the chains are | The third best prediction for ORF8 was by the Perez Lab, with GDT_TS 33 (see Table I above). It '''correctly predicted the parallel beta strands formed by the amino and carboxy terminal ends of the chain'''. <scene name='87/875686/3rd_best_orf8/1'>When the 2-stranded parallel beta strands formed by the ends of the chains are superposed, the remainder superposes poorly</scene><ref name="fragfit">Superposition by Swiss-PdbViewer's ''Explore Fragment Alternate Fits'', which does not use sequence information. Eight intermediate structures were generated by the [[Morphs#Linear_Morph_Server|Theis Morph Server]] by linear interpolation.</ref>. This prediction has '''no disulfide bonds'''. The '''salt bridge''' Arg86:Asp98 is correctly predicted, along with two incorrectly predicted salt bridges. | ||
===Top Prediction by an Automated Server=== | ===Top Prediction by an Automated Server=== | ||
Among predictions by automated servers for all ~100 CASP 14 targets, the top ranking server was QUARK from the Yang Zhang group (Univ. Michigan). For ORF8, the Zhang-TBM server made the best server prediction with a '''GDT_TS of 27'''. (The prediction by QUARK was almost as good, GDT_TS 26.) The prediction has the '''two chain termini not parallel, and the amino terminus is not a beta strand''', differing in both respects from the X-ray model. Also, '''no disulfide bonds''' are predicted. The '''salt bridge''' Arg86:Asp98 is correctly predicted, along with | Among predictions by automated servers for all ~100 CASP 14 targets, the top ranking server was QUARK from the Yang Zhang group (Univ. Michigan). For ORF8, the Zhang-TBM server made the best server prediction with a '''GDT_TS of 27'''. (The prediction by QUARK was almost as good, GDT_TS 26.) The prediction has the '''two chain termini not parallel, and the amino terminus is not a beta strand''', differing in both respects from the X-ray model. Also, '''no disulfide bonds''' are predicted. The '''salt bridge''' Arg86:Asp98 is correctly predicted, along with several incorrectly predicted salt bridges. The structural superposition is very poor and is not shown. | ||
===Baker Rosetta Server Prediction for ORF8=== | ===Baker Rosetta Server Prediction for ORF8=== | ||
Among predictions for all ~100 CASP 14 targets, the group of David Baker [https://predictioncenter.org/casp14/zscores_final.cgi ranked second]. The Rosetta Server of the Baker group ranked 18th overall, but was the 4th ranked server<ref name="serverranks">For all targets in CASP 14, the top two servers were QUARK and Zhang-server (which were not significantly different at a Z-score sum of 62.9), followed by Zhang-CEthreader (55.9) and BAKER-ROSETTASERVER (55.3).</ref>. [https://predictioncenter.org/casp14/results.cgi?view=tables&target=T1064-D1&model=1&groups_id= For ORF8, the Rosetta Server prediction GDT_TS was 26], a bit better than the median of 23. The Rosetta Server's prediction for ORF8 has '''the two termini far apart''' (Cα 13 Å or farther apart), a substantial difference from the X-ray structure (Cα mostly ~5 Å apart). It predicts '''two disulfide bonds, but neither matches''' the pairs of Cys residues in the actual disulfide bonds. The '''salt bridge''' Arg86:Asp98 is correctly predicted, along with one incorrectly predicted salt bridge. The structural | Among predictions for all ~100 CASP 14 targets, the group of David Baker [https://predictioncenter.org/casp14/zscores_final.cgi ranked second]. The Rosetta Server of the Baker group ranked 18th overall, but was the 4th ranked server<ref name="serverranks">For all targets in CASP 14, the top two servers were QUARK and Zhang-server (which were not significantly different at a Z-score sum of 62.9), followed by Zhang-CEthreader (55.9) and BAKER-ROSETTASERVER (55.3).</ref>. [https://predictioncenter.org/casp14/results.cgi?view=tables&target=T1064-D1&model=1&groups_id= For ORF8, the Rosetta Server prediction GDT_TS was 26], a bit better than the median of 23. The Rosetta Server's prediction for ORF8 has '''the two termini far apart''' (Cα 13 Å or farther apart), a substantial difference from the X-ray structure (Cα mostly ~5 Å apart). It predicts '''two disulfide bonds, but neither matches''' the pairs of Cys residues in the actual disulfide bonds. The '''salt bridge''' Arg86:Asp98 is correctly predicted, along with one incorrectly predicted salt bridge. The structural superposition is very poor and is not shown. | ||
===ORF8 Sidechain Prediction Accuracy=== | |||
Jump below to [[#ORF8 Sidechain Accuracy]] | |||
<!--########################################################--> | |||
==Phage RNA polymerase T1037== | |||
Our second example is a 404-residue domain within a 2,180 residue RNA polymerase, [[6vr4]], gp66 from a bacteriophage of the crAss-like group, some members of which are prevalent in the human gut<ref name="6vr4" />. One known host for the target-containing phage is gram-negative, aerobic bacteria ''Cellulphaga baltica'' isolated from marine microalgae<ref name="6vr4" /><ref name="cb">PMID: 10425785</ref>. This RNA polymerase is packaged in the virion, and delivered into the host cell, where it transcribes early phage genes<ref name="6vr4" />. CASP 14 classified this domain, coded '''T1037''', as a free modeling ("FM") target<ref name="casp14domains" />, meaning that no informative templates existed in the [[PDB]]. | |||
===X-ray crystal structure=== | |||
The crystallographic structure, not available to the prediction teams during the CASP competition, is [[6vr4]], with [[resolution]] 3.5 Å, and an Rfree "reliability" of "much better than average for this resolution" (according to [[FirstGlance in Jmol]]). The termini of the chain are far apart from each other (~85 Å), and there are no disulfide bonds. The [[asymmetric unit]] contains 2 chains. The reference structure was taken from '''chain B''' because it has a lower average [[temperature factor]] than chain A. The two chains are nearly identical (see Table II below). | |||
The CASP 14 target T1037 sequence of 404 residues begins at sequence number 337 and ends at 901, a span of 565 residues. The target sequence is 404 residues because it excludes residues 370-530 (length 161), which form a different domain. <scene name='87/875686/6vr4_b_2180_residues/4'>Here is the full 2,180 residue chain with 337-901 (565 residues) opaque</scene>. Here is <span style="color:#d000d0;"><b>the 404-residue target sequence </b></span> with the <span class="text-gray"><b>intervening domain (excluded from the CASP target)</b> <scene name='87/875686/6vr4_b_2180_residues/5'>highlighted within the full 2,180-residue chain</scene>. | |||
<scene name='87/875686/T1037_length_404/1'>The X-ray structure of CASP 14 domain T1037</scene> (length 404 residues) consists of residues 337-369 + 531-901 of [[6vr4]] (taken from chain B). It is an <scene name='87/875686/T1037_length_404/2'>alpha/beta domain with secondary structure</scene> <span style="color:#ff0080;font-weight:bold;">45% helices</span>, <span style="color:#ffc800;background-color:black;font-weight:bold;"> 19% beta strands </span>, and 37% loops and turns. The N- and C-termini are 10 Å apart, and there are no cysteines (thus no disulfide bonds). | |||
===T1037 contains several known fold fragments=== | |||
The X-ray structure of T1037 (404 residues from 6vr4) was submitted to Dali<ref name="dali2020" /> in March, 2021. Among the ~1,000 hits with Z ≥ 2.0, there were 152 with lengths ≥ 400 residues, and 224 with lengths ≥ 300, long enough that a superposition with the majority of T1037 would not be precluded. Among all hits, the largest number of aligned residues was 140/404 (35%) with RMSD 11.7 Å. The second largest was 127/404 (31%), RMSD 7.7 Å. Thus, no single structure in the PDB superposed with more than 35% of T1037. | |||
However, several of the Dali hits superposed with non-overlapping core fragments of [[6vr4]]<ref name="lholm">These non-overlapping core fragments were kindly pointed out by Liisa Holm, March, 2021.</ref>: | |||
*[[2j7n]] chain A, RNA-dependent RNA polymerase | |||
**length 934, aligned residues '''115, RMSD 4.3 Å''', Z=4.0, structural alignment 9 %id. | |||
*[[4ncj]] chain A, DNA double-strand break repair RAD50 ATPase | |||
**length 311, aligned residues '''109, RMSD 4.7 Å''', Z=3.4, structural alignment 11 %id. | |||
*[[5vfk]] chain A, Uncharacterized protein | |||
**length 146, aligned residues '''61, RMSD 7.8 Å''', Z=3.3, structural alignment 11 %id. | |||
Liisa Holm<ref name="dali2020" /><ref name="holmquote">Quoted with permission from Liisa Holm, March, 2021.</ref> stated: "T1037 has a homologous template in the PDB. The parent structure of T1037, phage RNA polymerase (6vr4, 2166 amino acids), is homologous to the RNAi polymerase from Neurospora crassa (2j7n chain A, 934 amino acids)<ref name="6vr4" />. Dali aligns them over 564 residues with an RMSD of 4.8 A. 115 residues of the common core are in the T1037 substructure. Several long insertions in T1037/6vr4 relative to 2j7n (chain A) form subdomains, which point outwards from the common core. Similar massive adaptation of the common core is seen, for example, in the glucosyltransferase 1 family<ref>PMID: 7729407</ref>." | |||
The [https://fatcat.godziklab.org/ FATCAT Server] reported that in order to superpose 150 residues (37% of 404) of T1037 with the closest structure in the PDB, 3 twists at hinges were required, after which an RMSD of 3.1 Å was achieved. For a 200-residue superposition (50% of 404), the best results after 3 twists had an RMSD of 5.4 Å. | |||
===AlphaFold2 prediction for T1037=== | |||
<scene name='87/875686/Morph_lin_6vr4_to_af2/1'>AlphaFold2 predicted the structure of T1037 with very high accuracy</scene><ref name="imf" />. 91% of the 404 alpha carbons can be aligned with RMSD 1.0 Å. (GDT_TS 88; see Table II below for details). | |||
{| style="text-align:center;" class="wikitable" | |||
|+ Table II. T1037 Predictions Superposed With Sub-Domain of [[6vr4]] Chain B. | |||
|- | |||
! Model || GDT_TS || Cα [https://en.wikipedia.org/wiki/Root-mean-square_deviation_of_atomic_positions RMSD], Å || Cα Superposed || [https://en.wikipedia.org/wiki/Root-mean-square_deviation_of_atomic_positions RMSD] Including<br>Sidechains, Å || Atoms Superposed | |||
|- | |||
| T1037 of<br>[[6vr4]]:A || 99.9<ref name="gdt_ts" /> || 0.25<br>'''0.25''' || 404/404 (100%)<br>'''404/404 (100%)''' || 0.58<br>'''0.24''' || 3157/3157 (100%)<br>'''1616/3157 (51%)''' | |||
|- | |||
| AlphaFold2 || 88 || 1.68<br>'''0.98''' || 404/404 (100%)<br>'''368/404 (91%)''' || 2.28<br>'''<span class="text-gray">1.01</span>''' || 3157/3157 (100%)<br>'''<span class="text-gray">1472/3157 (47%)</span>''' | |||
|- | |||
| 2nd Best* || 63 || 12.8<br>'''<span style="color:#b0b0b0;">1.90</span>''' || 404/404 (100%)<br>'''<span style="color:#b0b0b0;">52/404 (13%)</span>''' || 13.4<br>'''<span style="color:#b0b0b0;">2.18</span>''' || 3157/3157 (100%)<br>'''<span style="color:#b0b0b0;">252/3157 (8%)</span>''' | |||
|- | |||
| Seder§ || 53 || 12.3<br>'''<span style="color:#b0b0b0;">1.90</span>''' || 404/404 (100%)<br>'''<span style="color:#b0b0b0;">88/404 (22%)</span>''' || 12.7<br>'''<span style="color:#b0b0b0;">1.64†</span>''' || 3157/3157 (100%)<br>'''<span style="color:#b0b0b0;">804†/3157 (25%)</span>''' | |||
|} | |||
:Superpositions by "Magic Fit"<ref name="mf" /> of Swiss-PdbViewer 4.1.<br> | |||
:'''Superpositions by "Iterative Magic Fit"<ref name="imf" /> of Swiss-PdbViewer 4.1.'''<br> | |||
:<span class="text-gray">Superpositions involving less than 50% of each structure.</span><br> | |||
:<span style="color:#b0b0b0;">Superpositions involving ≤ 25% of each structure.</span><br> | |||
:*Second best by Seok-refine: Group of Chaok Seok, Seoul National University.<br> | |||
:§Prediction by Seder2020 (one of the predictions with GDT_TS 53, arbitrarily 10 less than the 2nd best with GDT_TS 63): Group of Andrzej Kloczkowski, Columbus, Ohio. '''Superposition not shown'''.<br> | |||
:†Close superposition of the three longest alpha helices. | |||
===Second Best Prediction for T1037=== | |||
Despite its impressive GDT_TS of 63, <scene name='87/875686/Morf_t1037_6vr4_to_2nd_cao/2'>the second best prediction for 1037 was far less accurate</scene><ref name="imf" /> than the prediction of AlphaFold2. (The second best prediction was by Seok-refine, from the group of Chaok Seok, Seoul National University.) | |||
==Calculating GDT_TS== | |||
Please see [[#GDT_TS Calculations]]. | |||
<!--########################################################--> | |||
</StructureSection> | </StructureSection> | ||
===ORF8 Sidechain Accuracy=== | ===ORF8 Sidechain Accuracy=== | ||
Table I gives RMSD values for all atoms, which is one indication of sidechain accuracy. | AlphaFold2's predictions for sidechain positions seem fairly good, while sidechain positions in the 2nd best prediction seem poor. This conclusion is based on three types of observations: | ||
*AlphaFold2's prediction was '''correct for one of two''' salt bridges, and predicted '''no incorrect''' salt bridges. | #Table I gives RMSD values for all atoms, which is one indication of sidechain accuracy. | ||
*AlphaFold2's prediction was '''correct for three of three''' cation-pi interactions, but predicted '''one incorrect''' interaction. | #Prediction of [[salt bridges]] and [[cation-pi interactions]]. | ||
*The 2nd best prediction was '''correct for one of two''' salt bridges, but predicted '''two incorrect''' salt bridges. | #Visualization of the distributions of charges on the surfaces. | ||
*The 2nd best prediction '''failed to predict any''' of the three cation-pi interactions, predicting zero interactions. | |||
====Salt Bridges and Cation-Pi Interactions==== | |||
*AlphaFold2's prediction was '''correct for 4/5''' interactions, with '''one incorrect''' interaction. | |||
**AlphaFold2's prediction was '''correct for one of two''' salt bridges, and predicted '''no incorrect''' salt bridges. | |||
**AlphaFold2's prediction was '''correct for three of three''' cation-pi interactions, but predicted '''one incorrect''' interaction. | |||
*The 2nd best prediction was '''correct for 1/5''' interactions, with '''2 incorrect''' interactions. | |||
**The 2nd best prediction was '''correct for one of two''' salt bridges, but predicted '''two incorrect''' salt bridges. | |||
**The 2nd best prediction '''failed to predict any''' of the three cation-pi interactions, predicting zero interactions. | |||
{| style="text-align:center;" class="wikitable" | {| style="text-align:center;" class="wikitable" | ||
|+ Table | |+ Table III. Salt Bridge Prediction Accuracy | ||
|- | |- | ||
!7JX6 !! 7JTL !! AlphaFold2 !! 2nd Best | !7JX6 !! 7JTL !! AlphaFold2 !! 2nd Best | ||
|- | |- | ||
| R101:D112 ( | | R101:D112 (AB) || R101:D113 (AB) || R86:D98 || R86:D98 | ||
|- | |- | ||
| R115:D119 ( | | R115:D119 (AB) || R115:D119 (AB) || – || ''R100:<span class="text-red">E4</span>'' | ||
|- | |- | ||
| K44:E59 ( | | K44:E59 (A<span class="text-gray">'''B'''</span>) || <span class="text-gray">K44:E59 (AB)</span> || <span class="text-gray">K29:E44</span> || – | ||
|- | |- | ||
| – || – || – || K78:E77 | | – || – || – || ''K78:E77'' | ||
|} | |} | ||
*Bridges in the same row are identical (except for <span class="text-red">red</span> residues). Subtract 15 from the sequence numbers in the X-ray structures for the equivalent sequence numbers in the predictions. | *Bridges in the same row are identical (except for <span class="text-red">red</span> residues). Subtract 15 from the sequence numbers in the X-ray structures for the equivalent sequence numbers in the predictions. | ||
Line 87: | Line 162: | ||
*<span class="text-gray">'''Gray''': Shortest sidechain nitrogen to sidechain oxygen distance 4.4 to 4.8 Å.</span> | *<span class="text-gray">'''Gray''': Shortest sidechain nitrogen to sidechain oxygen distance 4.4 to 4.8 Å.</span> | ||
*–: Shortest sidechain nitrogen to sidechain oxygen distance 6 to 16 Å. | *–: Shortest sidechain nitrogen to sidechain oxygen distance 6 to 16 Å. | ||
*(AB): The two chains in each X-ray model. | |||
*''Italics: erroneous prediction.'' | |||
{| style="text-align:center;" class="wikitable" | {| style="text-align:center;" class="wikitable" | ||
|+ Table | |+ Table IV. Cation-Pi Prediction Accuracy | ||
|- | |- | ||
!7JX6 !! 7JTL !! AlphaFold2 !! 2nd Best | !7JX6 !! 7JTL !! AlphaFold2 !! 2nd Best | ||
Line 100: | Line 177: | ||
|} | |} | ||
*All interactions listed are deemed energetically significant by the [http://capture.caltech.edu CaPTURE Server]. | *All interactions listed are deemed energetically significant by the [http://capture.caltech.edu CaPTURE Server]. | ||
*Interactions in the same row are identical | *Interactions in the same row are identical. Subtract 15 from the sequence numbers in the X-ray structures for the equivalent sequence numbers in the predictions. | ||
*''Italics: erroneous prediction.'' | *''Italics: erroneous prediction.'' | ||
*The 2nd best prediction has no cation-pi interactions. | *The 2nd best prediction has no cation-pi interactions. | ||
*(AB): The two chains in each X-ray model. | |||
====Visualization of Surface Charge Distributions==== | |||
The distributions of surface charges are in good agreement between AlphaFold2's prediction and the two crystal structures, which agree with each other. The distribution in the 2nd best prediction has several discrepancies with the other three models. | |||
[[Image:Orf8-casp14-charges.png]] | |||
==GDT_TS Calculations== | |||
GDT_TS values for predictions are taken from CASP 14 results. The reference structure for the CASP 14 GDT_TS values was 92 alpha carbons of 7JTL<ref name="casp14domains" />, since the CASP 14 target had only 92 residues<ref name="casp14domains" />. | |||
GDT_TS values for 7JTL and 5A2F vs. 7JX6 chain A were calculated using the [http://linum.proteinmodel.org/ AS2TS server] of Adam Zemla<ref name="zemla">PMID: 12824330</ref>. See instructions for [[Calculating GDT_TS]]. GDT_TS values were corrected for 92 residues (not 104) because the CASP 14 target had only 92 residues<ref name="casp14domains" />. | |||
For comparison, CASP 14 reported GDT_TS 86.96 for the AlphaFold2 prediction, while the AS2TS server calculated GDT_TS 85.87 vs. 7jx6 chain A, and 88.32 vs. 7JTL chain A. (These results were corrected for 90/92 and 91/92 residues, respectively.) Thus, there appears to be some unidentified minor discrepancy between the GDT_TS calculations of CASP-14 vs. the method detailed at [[Calculating GDT_TS]]. | |||
==See Also== | |||
*[[AlphaFold/Index]], a list of pages in Proteopedia about Alphafold. | |||
==References== | ==References & Notes== | ||
<references /> | <references /> |
Latest revision as of 03:08, 29 September 2023
Prediction of protein structures from amino acid sequences, theoretical modeling, has been extremely challenging. In 2020, breakthrough success was achieved by AlphaFold2[1], a project of DeepMind. For an overview of this breakthrough, documented by the bi-annual prediction competition CASP, please see 2020: CASP 14. Below are illustrated two examples of predictions from that competition.
Free Modeling ResultsMore than 100 domains were provided as prediction targets in CASP 14. 14 of these were in the most difficult category, free modeling ("FM"), meaning that no informative homology modeling templates existed. For 8 of these (57%), AlphaFold2's predictions achieved GDT_TS scores of 87-93 (median 88.5). For those 8, GDT_TS of the second best predictions were 43-76 (median 66). Two cases will be analyzed below. First, SARS-CoV-2 ORF8[2], a 92-residue FM domain where AlphaFold2's GDT_TS was 87, and the second best was 43 (by the group of Xian Ming Pan)[3], the largest difference between 1st and 2nd predictions among the FM targets. It is further unusual because two independently-determined X-ray crystallographic structures were subsequently published. Inspiration for this case came from the discussion by Rubiera[4]. Second, the longest domain in the FM category, 404 residues. This domain is part of the 2,180-residue RNA polymerase of a bacteriophage, some of whose group members are prevalent in the human gut[5]. Eight of the CASP 14 FM target domains are parts of this protein, 6vr4. For the 404-residue domain T1037, AlphaFold2 achieved GDT_TS of 88, and the second best prediction, 63 (by Seok-refine)[6]. Among the 14 FM targets, the second-longest has 276 residues, the median 132, and the shortest, 92. SARS-CoV-2 ORF8Our first example is SARS-CoV-2 protein ORF8, a protein that contributes to virulence in COVID-19[2]. CASP 14 classified ORF8 as a "free modeling" (FM) target[7], meaning that there were no adequate empirical templates for homology modeling. This was easily confirmed. When the amino acid sequence of ORF8 is submitted to Swiss Model, it reports the best templates for homology modeling. When the two empirical models that were not available during CASP 14 are excluded (7jtl and 7jx6), the best template offered, chain B of 3afc, covers only 36% of the length of ORF8 at 13.2% sequence identity, with a 4-residue untemplated gap in the sequence alignment. This template would not be adequate for constructing a useful model. X-Ray Structures for ORF8The quality of predictions for the structure of ORF8 are judged by comparison with X-ray crystallographic empirical models which were not available to the groups making predictions. Shortly after the CASP 14 competition (summer 2020), two X-ray crystal structures were reported for ORF8: 7jtl released August 26, 2020, and 7jx6, released September 23, 2020. The resolutions are 2.0 and 1.6 Å respectively, and both have worse than average Rfree values.
from the higher resolution X-ray structure, 7jx6. These chains form disulfide-linked dimers, and the dimers form higher order multimers[2] (not shown). Notice that the amino and carboxy ends of the chain come together to form two parallel beta strands of a beta sheet. Also notice that there are 3 disulfide bonds. An accurate prediction would include both of these features. [8]. The only substantial disagreement is for a large surface loop, sequence range 48-57. See the Table I below for RMSD values. ORF8 is not a novel foldLess than 2% of new empirically-determined structures have novel folds; that is, folds not aready represented in the PDB[9]. When chain A of 7jx6 was submitted to Dali[10] (February, 2021), the top hit was the N-terminal domain of the two domains in 5a2f, the CD166 human cell surface receptor involved in activation of T lymphocytes. The Z-score was 7.1, and 88 alpha carbons superposed with RMSD 3.2 Å. Swiss-PdbViewer obtained RMSD 1.95 Å for 48 alpha carbons[11]. Dali reported the identity as 6% in its structure-based sequence alignment. Sequence alignment by MAFFT[12] obtained 18% sequence identity using more and larger gaps. [13] is not as close as for AlphaFold2's prediction, but is closer than the 2nd best prediction (see Table I below). In conclusion, ORF8 does not have a novel fold[14]. AlphaFold2 Prediction for ORF8The quality of a prediction in CASP is judged, in large part, by the Global Distance Test Total Score, GDT_TS. AlphaFold2's predicted structure[15] has a GDT_TS score of 87. (A score of 0 is meaningless, and a score of 100 means perfect agreement with an X-ray crystal structure.) 87 means [8]. The structure predicted by AlphaFold2 is almost as close to the X-ray crystallographic model 7jx6 as is the independently-determined X-ray structure 7jtl. AlphaFold2 predicted the positions of 92 amino acids. (CASP 14 excluded residues 48-59, a 12-residue surface loop, from the target residues[7].) See Table I below for RMSD values. The prediction was largely accurate regarding salt bridges and cation-pi interactions (see Tables III and IV below).
Second Best Prediction for ORF8In CASP 14, 70 research groups and 42 automated servers predicted structures for ORF8. The median GDT_TS score for all 112 predictions was 26. AlphaFold2 made the best prediction (GDT_TS 87). [8], with GDT_TS 43 (see Table I above). The fold and topology were predicted correctly, but the details are far less accurate than those in AlphaFold2's prediction. The 2nd best prediction has no disulfide bonds. This prediction was largely incorrect regarding salt bridges and cation-pi interactions (see Tables III and IV below). Third Best Prediction for ORF8The third best prediction for ORF8 was by the Perez Lab, with GDT_TS 33 (see Table I above). It correctly predicted the parallel beta strands formed by the amino and carboxy terminal ends of the chain. [19]. This prediction has no disulfide bonds. The salt bridge Arg86:Asp98 is correctly predicted, along with two incorrectly predicted salt bridges. Top Prediction by an Automated ServerAmong predictions by automated servers for all ~100 CASP 14 targets, the top ranking server was QUARK from the Yang Zhang group (Univ. Michigan). For ORF8, the Zhang-TBM server made the best server prediction with a GDT_TS of 27. (The prediction by QUARK was almost as good, GDT_TS 26.) The prediction has the two chain termini not parallel, and the amino terminus is not a beta strand, differing in both respects from the X-ray model. Also, no disulfide bonds are predicted. The salt bridge Arg86:Asp98 is correctly predicted, along with several incorrectly predicted salt bridges. The structural superposition is very poor and is not shown. Baker Rosetta Server Prediction for ORF8Among predictions for all ~100 CASP 14 targets, the group of David Baker ranked second. The Rosetta Server of the Baker group ranked 18th overall, but was the 4th ranked server[20]. For ORF8, the Rosetta Server prediction GDT_TS was 26, a bit better than the median of 23. The Rosetta Server's prediction for ORF8 has the two termini far apart (Cα 13 Å or farther apart), a substantial difference from the X-ray structure (Cα mostly ~5 Å apart). It predicts two disulfide bonds, but neither matches the pairs of Cys residues in the actual disulfide bonds. The salt bridge Arg86:Asp98 is correctly predicted, along with one incorrectly predicted salt bridge. The structural superposition is very poor and is not shown. ORF8 Sidechain Prediction AccuracyJump below to #ORF8 Sidechain Accuracy Phage RNA polymerase T1037Our second example is a 404-residue domain within a 2,180 residue RNA polymerase, 6vr4, gp66 from a bacteriophage of the crAss-like group, some members of which are prevalent in the human gut[5]. One known host for the target-containing phage is gram-negative, aerobic bacteria Cellulphaga baltica isolated from marine microalgae[5][21]. This RNA polymerase is packaged in the virion, and delivered into the host cell, where it transcribes early phage genes[5]. CASP 14 classified this domain, coded T1037, as a free modeling ("FM") target[7], meaning that no informative templates existed in the PDB. X-ray crystal structureThe crystallographic structure, not available to the prediction teams during the CASP competition, is 6vr4, with resolution 3.5 Å, and an Rfree "reliability" of "much better than average for this resolution" (according to FirstGlance in Jmol). The termini of the chain are far apart from each other (~85 Å), and there are no disulfide bonds. The asymmetric unit contains 2 chains. The reference structure was taken from chain B because it has a lower average temperature factor than chain A. The two chains are nearly identical (see Table II below). The CASP 14 target T1037 sequence of 404 residues begins at sequence number 337 and ends at 901, a span of 565 residues. The target sequence is 404 residues because it excludes residues 370-530 (length 161), which form a different domain. . Here is the 404-residue target sequence with the intervening domain (excluded from the CASP target) . (length 404 residues) consists of residues 337-369 + 531-901 of 6vr4 (taken from chain B). It is an 45% helices, 19% beta strands , and 37% loops and turns. The N- and C-termini are 10 Å apart, and there are no cysteines (thus no disulfide bonds). T1037 contains several known fold fragmentsThe X-ray structure of T1037 (404 residues from 6vr4) was submitted to Dali[10] in March, 2021. Among the ~1,000 hits with Z ≥ 2.0, there were 152 with lengths ≥ 400 residues, and 224 with lengths ≥ 300, long enough that a superposition with the majority of T1037 would not be precluded. Among all hits, the largest number of aligned residues was 140/404 (35%) with RMSD 11.7 Å. The second largest was 127/404 (31%), RMSD 7.7 Å. Thus, no single structure in the PDB superposed with more than 35% of T1037. However, several of the Dali hits superposed with non-overlapping core fragments of 6vr4[22]:
Liisa Holm[10][23] stated: "T1037 has a homologous template in the PDB. The parent structure of T1037, phage RNA polymerase (6vr4, 2166 amino acids), is homologous to the RNAi polymerase from Neurospora crassa (2j7n chain A, 934 amino acids)[5]. Dali aligns them over 564 residues with an RMSD of 4.8 A. 115 residues of the common core are in the T1037 substructure. Several long insertions in T1037/6vr4 relative to 2j7n (chain A) form subdomains, which point outwards from the common core. Similar massive adaptation of the common core is seen, for example, in the glucosyltransferase 1 family[24]." The FATCAT Server reported that in order to superpose 150 residues (37% of 404) of T1037 with the closest structure in the PDB, 3 twists at hinges were required, after which an RMSD of 3.1 Å was achieved. For a 200-residue superposition (50% of 404), the best results after 3 twists had an RMSD of 5.4 Å. AlphaFold2 prediction for T1037[8]. 91% of the 404 alpha carbons can be aligned with RMSD 1.0 Å. (GDT_TS 88; see Table II below for details).
Second Best Prediction for T1037Despite its impressive GDT_TS of 63, [8] than the prediction of AlphaFold2. (The second best prediction was by Seok-refine, from the group of Chaok Seok, Seoul National University.) Calculating GDT_TSPlease see #GDT_TS Calculations.
|
|
ORF8 Sidechain AccuracyORF8 Sidechain Accuracy
AlphaFold2's predictions for sidechain positions seem fairly good, while sidechain positions in the 2nd best prediction seem poor. This conclusion is based on three types of observations:
- Table I gives RMSD values for all atoms, which is one indication of sidechain accuracy.
- Prediction of salt bridges and cation-pi interactions.
- Visualization of the distributions of charges on the surfaces.
Salt Bridges and Cation-Pi InteractionsSalt Bridges and Cation-Pi Interactions
- AlphaFold2's prediction was correct for 4/5 interactions, with one incorrect interaction.
- AlphaFold2's prediction was correct for one of two salt bridges, and predicted no incorrect salt bridges.
- AlphaFold2's prediction was correct for three of three cation-pi interactions, but predicted one incorrect interaction.
- The 2nd best prediction was correct for 1/5 interactions, with 2 incorrect interactions.
- The 2nd best prediction was correct for one of two salt bridges, but predicted two incorrect salt bridges.
- The 2nd best prediction failed to predict any of the three cation-pi interactions, predicting zero interactions.
7JX6 | 7JTL | AlphaFold2 | 2nd Best |
---|---|---|---|
R101:D112 (AB) | R101:D113 (AB) | R86:D98 | R86:D98 |
R115:D119 (AB) | R115:D119 (AB) | – | R100:E4 |
K44:E59 (AB) | K44:E59 (AB) | K29:E44 | – |
– | – | – | K78:E77 |
- Bridges in the same row are identical (except for red residues). Subtract 15 from the sequence numbers in the X-ray structures for the equivalent sequence numbers in the predictions.
- Black: Shortest sidechain nitrogen to sidechain oxygen distance ≤4.0 Å.
- Gray: Shortest sidechain nitrogen to sidechain oxygen distance 4.4 to 4.8 Å.
- –: Shortest sidechain nitrogen to sidechain oxygen distance 6 to 16 Å.
- (AB): The two chains in each X-ray model.
- Italics: erroneous prediction.
7JX6 | 7JTL | AlphaFold2 | 2nd Best |
---|---|---|---|
R101:Y46+Y108 (AB) | R101:Y46+Y108 (AB) | R86:Y31+Y96 | – |
K44:F108 (B) | K44:F108 (AB) | K29:F93 | – |
– | – | K79:F105 | – |
- All interactions listed are deemed energetically significant by the CaPTURE Server.
- Interactions in the same row are identical. Subtract 15 from the sequence numbers in the X-ray structures for the equivalent sequence numbers in the predictions.
- Italics: erroneous prediction.
- The 2nd best prediction has no cation-pi interactions.
- (AB): The two chains in each X-ray model.
Visualization of Surface Charge DistributionsVisualization of Surface Charge Distributions
The distributions of surface charges are in good agreement between AlphaFold2's prediction and the two crystal structures, which agree with each other. The distribution in the 2nd best prediction has several discrepancies with the other three models.
GDT_TS CalculationsGDT_TS Calculations
GDT_TS values for predictions are taken from CASP 14 results. The reference structure for the CASP 14 GDT_TS values was 92 alpha carbons of 7JTL[7], since the CASP 14 target had only 92 residues[7].
GDT_TS values for 7JTL and 5A2F vs. 7JX6 chain A were calculated using the AS2TS server of Adam Zemla[25]. See instructions for Calculating GDT_TS. GDT_TS values were corrected for 92 residues (not 104) because the CASP 14 target had only 92 residues[7].
For comparison, CASP 14 reported GDT_TS 86.96 for the AlphaFold2 prediction, while the AS2TS server calculated GDT_TS 85.87 vs. 7jx6 chain A, and 88.32 vs. 7JTL chain A. (These results were corrected for 90/92 and 91/92 residues, respectively.) Thus, there appears to be some unidentified minor discrepancy between the GDT_TS calculations of CASP-14 vs. the method detailed at Calculating GDT_TS.
See AlsoSee Also
- AlphaFold/Index, a list of pages in Proteopedia about Alphafold.
References & NotesReferences & Notes
- ↑ Senior AW, Evans R, Jumper J, Kirkpatrick J, Sifre L, Green T, Qin C, Zidek A, Nelson AWR, Bridgland A, Penedones H, Petersen S, Simonyan K, Crossan S, Kohli P, Jones DT, Silver D, Kavukcuoglu K, Hassabis D. Improved protein structure prediction using potentials from deep learning. Nature. 2020 Jan;577(7792):706-710. doi: 10.1038/s41586-019-1923-7. Epub 2020 Jan, 15. PMID:31942072 doi:http://dx.doi.org/10.1038/s41586-019-1923-7
- ↑ 2.0 2.1 2.2 Flower TG, Buffalo CZ, Hooy RM, Allaire M, Ren X, Hurley JH. Structure of SARS-CoV-2 ORF8, a rapidly evolving immune evasion protein. Proc Natl Acad Sci U S A. 2021 Jan 12;118(2). pii: 2021785118. doi:, 10.1073/pnas.2021785118. PMID:33361333 doi:http://dx.doi.org/10.1073/pnas.2021785118
- ↑ For SARS-CoV-2 ORF8, at the CASP 14 Table Browser, check T1064-D1 and press Show Results.
- ↑ CASP14: what Google DeepMind’s AlphaFold 2 really achieved, and what it means for protein folding, biology and bioinformatics, a blog post by Carlos Outeir al Rubiera, December 3, 2020.
- ↑ 5.0 5.1 5.2 5.3 5.4 Drobysheva AV, Panafidina SA, Kolesnik MV, Klimuk EI, Minakhin L, Yakunina MV, Borukhov S, Nilsson E, Holmfeldt K, Yutin N, Makarova KS, Koonin EV, Severinov KV, Leiman PG, Sokolova ML. Structure and function of virion RNA polymerase of a crAss-like phage. Nature. 2020 Nov 18. pii: 10.1038/s41586-020-2921-5. doi:, 10.1038/s41586-020-2921-5. PMID:33208949 doi:http://dx.doi.org/10.1038/s41586-020-2921-5
- ↑ For the phage RNA polymerase target, at the CASP 14 Table Browser, check T1037-D1 and press Show Results.
- ↑ 7.0 7.1 7.2 7.3 7.4 7.5 Summary, Definitions and Classifications of Domains for CASP 14.
- ↑ 8.0 8.1 8.2 8.3 8.4 8.5 8.6 Superposition by Swiss-PdbViewer's iterative magic fit. This starts with a sequence alignment-guided structural superposition, and then superposes subsets of the structures to minimize the RMSD. Eight intermediate structures were generated by the Theis Morph Server by linear interpolation.
- ↑ Cuff AL, Sillitoe I, Lewis T, Clegg AB, Rentzsch R, Furnham N, Pellegrini-Calace M, Jones D, Thornton J, Orengo CA. Extending CATH: increasing coverage of the protein structure universe and linking structure with function. Nucleic Acids Res. 2011 Jan;39(Database issue):D420-6. doi: 10.1093/nar/gkq1001. , Epub 2010 Nov 19. PMID:21097779 doi:http://dx.doi.org/10.1093/nar/gkq1001
- ↑ 10.0 10.1 10.2 Holm L. DALI and the persistence of protein shape. Protein Sci. 2020 Jan;29(1):128-140. doi: 10.1002/pro.3749. Epub 2019 Nov 5. PMID:31606894 doi:http://dx.doi.org/10.1002/pro.3749
- ↑ Using Swiss-PdbViewer's Fit from Selection with 102 residues selected from each structure, followed by Improve Fit.
- ↑ Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013 Apr;30(4):772-80. doi: 10.1093/molbev/mst010. Epub 2013 Jan, 16. PMID:23329690 doi:http://dx.doi.org/10.1093/molbev/mst010
- ↑ Structural superposition by Dali. Interpolation by the Yale Morph2 Server. Homogenization method: homology modeling. No minimization. This produced a 9-model file where model 1 was 7jx6, and models 2-9 were interpolations. 5a2f residues 28-133 were added as model 10 (black in the molecular scene).
- ↑ The interpretation of Dali's result to mean that ORF8 does not have a novel fold was kindly confirmed by Liisa Holm, personal communication to Eric Martz, February, 2021.
- ↑ Download AlphaFold2's predicted structure for ORF8 from T1064TS427_1-D1.pdb.
- ↑ 16.0 16.1 16.2 See #GDT_TS Calculations.
- ↑ See #ORF8 is not a novel fold.
- ↑ 18.0 18.1 Superposition by Swiss-PdbViewer's magic fit. This is a sequence alignment-guided structural superposition. Eight intermediate structures were generated by the Theis Morph Server by linear interpolation.
- ↑ Superposition by Swiss-PdbViewer's Explore Fragment Alternate Fits, which does not use sequence information. Eight intermediate structures were generated by the Theis Morph Server by linear interpolation.
- ↑ For all targets in CASP 14, the top two servers were QUARK and Zhang-server (which were not significantly different at a Z-score sum of 62.9), followed by Zhang-CEthreader (55.9) and BAKER-ROSETTASERVER (55.3).
- ↑ Johansen JE, Nielsen P, Sjoholm C. Description of Cellulophaga baltica gen. nov., sp. nov. and Cellulophaga fucicola gen. nov., sp. nov. and reclassification of [Cytophaga] lytica to Cellulophaga lytica gen. nov., comb. nov. Int J Syst Bacteriol. 1999 Jul;49 Pt 3:1231-40. doi: 10.1099/00207713-49-3-1231. PMID:10425785 doi:http://dx.doi.org/10.1099/00207713-49-3-1231
- ↑ These non-overlapping core fragments were kindly pointed out by Liisa Holm, March, 2021.
- ↑ Quoted with permission from Liisa Holm, March, 2021.
- ↑ Holm L, Sander C. Evolutionary link between glycogen phosphorylase and a DNA modifying enzyme. EMBO J. 1995 Apr 3;14(7):1287-93. PMID:7729407
- ↑ Zemla A. LGA: A method for finding 3D similarities in protein structures. Nucleic Acids Res. 2003 Jul 1;31(13):3370-4. doi: 10.1093/nar/gkg571. PMID:12824330 doi:http://dx.doi.org/10.1093/nar/gkg571