AlphaFold2 examples from CASP 14: Difference between revisions

From Proteopedia
Jump to navigation Jump to search
Eric Martz (talk | contribs)
No edit summary
Eric Martz (talk | contribs)
No edit summary
Line 13: Line 13:
The quality of predictions for the structure of ORF8 are judged by comparison with X-ray crystallographic [[empirical models]] which were not available to the groups making predictions. Shortly after the CASP 14 competition (summer 2020), two X-ray crystal structures were reported for ORF8: [[7jtl]] released August 26, 2020, and [[7jx6]], released September 23, 2020. The [[resolution|resolutions]] are 2.0 and 1.6 Å respectively, and both have worse than average [[Rfree]] values.
The quality of predictions for the structure of ORF8 are judged by comparison with X-ray crystallographic [[empirical models]] which were not available to the groups making predictions. Shortly after the CASP 14 competition (summer 2020), two X-ray crystal structures were reported for ORF8: [[7jtl]] released August 26, 2020, and [[7jx6]], released September 23, 2020. The [[resolution|resolutions]] are 2.0 and 1.6 Å respectively, and both have worse than average [[Rfree]] values.
{{Template:Green links zoom}}
{{Template:Green links zoom}}
<scene name='87/875686/Chain_a_of_7jx6/1'>Here is one chain of ORF8</scene> from the higher resolution X-ray structure, [[7jx6]]. These chains form [http://firstglance.jmol.org/fg.htm?mol=7jx6 disulfide-linked dimers], and the dimers form higher order multimers<ref name="multimers">PMID: 33361333</ref> (not shown). Notice that the <span class="text-blue"><b>amino</b></span> and <span class="text-red"><b>carboxy</b></span> ends of the chain come together to form two parallel beta strands of a beta sheet. Also notice that there are 3 disulfide bonds. An accurate prediction would include both of these features.
<scene name='87/875686/Chain_a_of_7jx6/1'>Here is one chain of ORF8</scene> from the higher resolution X-ray structure, [[7jx6]]. These chains form [http://firstglance.jmol.org/fg.htm?mol=7jx6 disulfide-linked dimers], and the dimers form higher order multimers<ref name="multimers">PMID: 33361333</ref> (not shown). Notice that the <span class="text-blue"><b>amino</b></span> and <span class="text-red"><b>carboxy</b></span> '''ends of the chain come together''' to form two parallel beta strands of a beta sheet. Also notice that there are '''3 disulfide bonds'''. An accurate prediction would include both of these features.


<scene name='87/875686/Morf_lin_7jx6_imf_7jtl/3'>The two X-ray structures agree very well</scene><ref name="imf">Alignment by Swiss-PdbViewer's ''iterative magic fit''. This starts with a sequence alignment-guided structural alignment, and then selects subsets of the structures to minimize the RMSD. Eight intermediate structures were generated by the [[Morphs#Linear_Morph_Server|Theis Morph Server]] by linear interpolation.</ref>. The only substantial disagreement is for a large surface loop, sequence range 48-57. See the TABLE below for [https://en.wikipedia.org/wiki/Root-mean-square_deviation_of_atomic_positions RMSD] values.
<scene name='87/875686/Morf_lin_7jx6_imf_7jtl/3'>The two X-ray structures agree very well</scene><ref name="imf">Alignment by Swiss-PdbViewer's ''iterative magic fit''. This starts with a sequence alignment-guided structural alignment, and then selects subsets of the structures to minimize the RMSD. Eight intermediate structures were generated by the [[Morphs#Linear_Morph_Server|Theis Morph Server]] by linear interpolation.</ref>. The only substantial disagreement is for a large surface loop, sequence range 48-57. See the TABLE below for [https://en.wikipedia.org/wiki/Root-mean-square_deviation_of_atomic_positions RMSD] values.

Revision as of 23:46, 25 February 2021

This page is under construction. Eric Martz 01:03, 22 February 2021 (UTC)

Prediction of protein structures from amino acid sequences, theoretical modeling, has been extremely challenging. In 2020, breakthrough success was achieved by AlphaFold2[1], a project of DeepMind. For an overview of this breakthrough, documented by the bi-annual prediction competition CASP, please see 2020: CASP 14. Below are illustrated some examples of predictions from that competition.


SARS-CoV-2 ORF8

Following the discussion by Rubiera[2],

our first example will be SARS-CoV-2 protein ORF8, a protein that contributes to virulence in COVID-19[3]. CASP 14 classified ORF8 as a "free modeling" (FM) target[4], meaning that there were no adequate empirical templates for homology modeling. This was easily confirmed. When the amino acid sequence of ORF8 is submitted to Swiss Model, it reports the best templates for homology modeling. When the two empirical models that were not available during CASP 14 are excluded (7jtl and 7jx6), the best template offered, chain B of 3afc, covers only 36% of the length of ORF8 at 13.2% sequence identity, with a 4-residue untemplated gap in the sequence alignment. This template would not be adequate for constructing a useful model.

X-Ray Structures for ORF8

The quality of predictions for the structure of ORF8 are judged by comparison with X-ray crystallographic empirical models which were not available to the groups making predictions. Shortly after the CASP 14 competition (summer 2020), two X-ray crystal structures were reported for ORF8: 7jtl released August 26, 2020, and 7jx6, released September 23, 2020. The resolutions are 2.0 and 1.6 Å respectively, and both have worse than average Rfree values.

Click the green links below to change the molecular scene. Drag to rotate.
Zoom the molecule with your mouse wheel, or Shift-Drag up/down.

from the higher resolution X-ray structure, 7jx6. These chains form disulfide-linked dimers, and the dimers form higher order multimers[5] (not shown). Notice that the amino and carboxy ends of the chain come together to form two parallel beta strands of a beta sheet. Also notice that there are 3 disulfide bonds. An accurate prediction would include both of these features.

[6]. The only substantial disagreement is for a large surface loop, sequence range 48-57. See the TABLE below for RMSD values.

AlphaFold2 Prediction for ORF8

The quality of a prediction in CASP is judged, in large part, by the Global Distance Test Total Score, GDT_TS. AlphaFold2's predicted structure[7] has a GDT_TS score of 87. (A score of 0 is meaningless, and a score of 100 means perfect agreement with an X-ray crystal structure.) 87 means [6]. The structure predicted by AlphaFold2 is almost as close to the X-ray crystallographic model 7jx6 as is the independently-determined X-ray structure 7jtl. AlphaFold2 predicted the positions of 92 amino acids. (CASP 14 excluded residues 48-59, a 12-residue surface loop, from the target residues[4].) See TABLE below for RMSD values.

ORF8 Alignments With Chain A of 7jx6
Model GDT_TS Disulfde
Bonds
RMSD, Å Cα Aligned RMSD Including
Sidechains, Å
Atoms Aligned
7jtl:A 3 4.02
0.66
102/102 (100%)
87/102 (85%)
4.3
1.58
829/829 (100%)
709/829 (86%)
AlphaFold2 87 3 2.58
1.25
92/92 (100%)
83/92* (90%)
3.23
1.91
747/748 (100%)
679/748 (91%)
2nd Best* 43 0 5.33
1.71
92/92 (100%)
38/92 (41%)
6.54
5.86
747/748 (100%)
324/748 (43%)
3rd Best§ 33 0 13.37
92/92 (100%)
14.50
747/748 (100%)
Rosetta
Server
26 0 14.99
92/92 (100%)
Alignments by "Magic Fit" of Swiss-PdbViewer 4.1.
Alignments by "Iterative Magic Fit" of Swiss-PdbViewer 4.1.
*Second best: Group of Xian Ming Pan.
§Third best: Group of Perez Lab.
† Iterative Magic Fit was unable to align.

Second Best Prediction for ORF8

In CASP 14, 70 research groups and 42 automated servers predicted structures for ORF8. The median GDT_TS score for all 112 predictions was 26. AlphaFold2 made the best prediction (GDT_TS 87). , with GDT_TS 43 (see TABLE above). The fold and topology were predicted correctly, but the details are far less accurate than those in AlphaFold2's prediction. The 2nd best prediction does accurately include the three disulfide bonds found in the empirical models (not shown).

Third Best Prediction for ORF8

The third best prediction was by the Perez Lab, with GDT_TS 33 (see TABLE above). It correctly predicted the anti-parallel beta sheet formed by the amino and carboxy terminal ends of the chain. . This prediction failed to include any of the three disulfide bonds found in the empirical models, or in the best and 2nd best predictions (not shown).

Baker Rosetta Server Prediction for ORF8

Among predictions for all ~100 CASP 14 targets, the group of David Baker ranked second. The Rosetta Server of the Baker group ranked 18th overall, but was the 4th ranked server[8]. For ORF8, the Rosetta Server prediction GDT_TS was 26, a bit better than the median of 23. The Rosetta Server's prediction for ORF8 has the two termini far apart (Cα 13 Å or farther apart), a substantial difference from the X-ray structure (Cα mostly ~5 Å apart). It predicts two disulfide bonds, but neither matches the pairs of Cys residues in the actual disulfide bonds. The structural alignment is very poor and is not shown.


Drag the structure with the mouse to rotate

ReferencesReferences

  1. Senior AW, Evans R, Jumper J, Kirkpatrick J, Sifre L, Green T, Qin C, Zidek A, Nelson AWR, Bridgland A, Penedones H, Petersen S, Simonyan K, Crossan S, Kohli P, Jones DT, Silver D, Kavukcuoglu K, Hassabis D. Improved protein structure prediction using potentials from deep learning. Nature. 2020 Jan;577(7792):706-710. doi: 10.1038/s41586-019-1923-7. Epub 2020 Jan, 15. PMID:31942072 doi:http://dx.doi.org/10.1038/s41586-019-1923-7
  2. CASP14: what Google DeepMind’s AlphaFold 2 really achieved, and what it means for protein folding, biology and bioinformatics, a blog post by Carlos Outeir al Rubiera, December 3, 2020.
  3. Flower TG, Buffalo CZ, Hooy RM, Allaire M, Ren X, Hurley JH. Structure of SARS-CoV-2 ORF8, a rapidly evolving immune evasion protein. Proc Natl Acad Sci U S A. 2021 Jan 12;118(2). pii: 2021785118. doi:, 10.1073/pnas.2021785118. PMID:33361333 doi:http://dx.doi.org/10.1073/pnas.2021785118
  4. 4.0 4.1 Summary and Classifications of Domains for CASP 14.
  5. Flower TG, Buffalo CZ, Hooy RM, Allaire M, Ren X, Hurley JH. Structure of SARS-CoV-2 ORF8, a rapidly evolving immune evasion protein. Proc Natl Acad Sci U S A. 2021 Jan 12;118(2). pii: 2021785118. doi:, 10.1073/pnas.2021785118. PMID:33361333 doi:http://dx.doi.org/10.1073/pnas.2021785118
  6. 6.0 6.1 Alignment by Swiss-PdbViewer's iterative magic fit. This starts with a sequence alignment-guided structural alignment, and then selects subsets of the structures to minimize the RMSD. Eight intermediate structures were generated by the Theis Morph Server by linear interpolation.
  7. Download AlphaFold2's predicted structure for ORF8 from T1064TS427_1-D1.pdb.
  8. For all targets in CASP 14, the top two servers were QUARK and Zhang-server (which were not significantly different at a Z-score sum of 62.9), followed by Zhang-CEthreader (55.9) and BAKER-ROSETTASERVER (55.3).

Proteopedia Page Contributors and Editors (what is this?)Proteopedia Page Contributors and Editors (what is this?)

Eric Martz