AlphaFold2 examples from CASP 14: Difference between revisions

Eric Martz (talk | contribs)
No edit summary
Eric Martz (talk | contribs)
No edit summary
 
(7 intermediate revisions by the same user not shown)
Line 1: Line 1:
<span class="text-red">'''This page is under construction.''' [[User:Eric Martz|Eric Martz]] 01:03, 22 February 2021 (UTC)</span>
Prediction of protein structures from amino acid sequences, [[theoretical modeling]], has been extremely challenging. In 2020, breakthrough success was achieved by AlphaFold2<ref name="af2">PMID:31942072</ref>, a project of [http://deepmind.com DeepMind]. '''For an overview of this breakthrough''', documented by the bi-annual prediction competition [[Theoretical_models#CASP|CASP]], please see [[Theoretical_models#2020:_CASP_14|2020: CASP 14]]. Below are illustrated two examples of predictions from that competition.
Prediction of protein structures from amino acid sequences, [[theoretical modeling]], has been extremely challenging. In 2020, breakthrough success was achieved by AlphaFold2<ref name="af2">PMID:31942072</ref>, a project of [http://deepmind.com DeepMind]. '''For an overview of this breakthrough''', documented by the bi-annual prediction competition [[Theoretical_models#CASP|CASP]], please see [[Theoretical_models#2020:_CASP_14|2020: CASP 14]]. Below are illustrated two examples of predictions from that competition.


Line 88: Line 86:
The X-ray structure of T1037 (404 residues from 6vr4) was submitted to Dali<ref name="dali2020" /> in March, 2021. Among the ~1,000 hits with Z ≥ 2.0, there were 152 with lengths ≥ 400 residues, and 224 with lengths ≥ 300, long enough that a superposition with the majority of T1037 would not be precluded. Among all hits, the largest number of aligned residues was 140/404 (35%) with RMSD 11.7 Å. The second largest was 127/404 (31%), RMSD 7.7 Å. Thus, no single structure in the PDB superposed with more than 35% of T1037.
The X-ray structure of T1037 (404 residues from 6vr4) was submitted to Dali<ref name="dali2020" /> in March, 2021. Among the ~1,000 hits with Z ≥ 2.0, there were 152 with lengths ≥ 400 residues, and 224 with lengths ≥ 300, long enough that a superposition with the majority of T1037 would not be precluded. Among all hits, the largest number of aligned residues was 140/404 (35%) with RMSD 11.7 Å. The second largest was 127/404 (31%), RMSD 7.7 Å. Thus, no single structure in the PDB superposed with more than 35% of T1037.


However, several of the Dali hits superposed with non-overlapping core fragments of [[6vr4]]:
However, several of the Dali hits superposed with non-overlapping core fragments of [[6vr4]]<ref name="lholm">These non-overlapping core fragments were kindly pointed out by Liisa Holm, March, 2021.</ref>:
*[[2j7n]] chain A, RNA-dependent RNA polymerase
*[[2j7n]] chain A, RNA-dependent RNA polymerase
**length 934, aligned residues '''115, RMSD 4.3 Å''', structural alignment 9 %id.
**length 934, aligned residues '''115, RMSD 4.3 Å''', Z=4.0, structural alignment 9 %id.
*[[4ncj]] chain A, DNA double-strand break repair RAD50 ATPase
*[[4ncj]] chain A, DNA double-strand break repair RAD50 ATPase
**length 311, aligned residues '''109, RMSD 4.7 Å''', structural alignment 11 %id.
**length 311, aligned residues '''109, RMSD 4.7 Å''', Z=3.4, structural alignment 11 %id.
*[[5vfk]] chain A, Uncharacterized protein
*[[5vfk]] chain A, Uncharacterized protein
**length 146, aligned residues '''61, RMSD 7.8 Å''', structural alignment 11 %id.
**length 146, aligned residues '''61, RMSD 7.8 Å''', Z=3.3, structural alignment 11 %id.
 
Liisa Holm<ref name="dali2020" /><ref name="holmquote">Quoted with permission from Liisa Holm, March, 2021.</ref> stated: "T1037 has a homologous template in the PDB. The parent structure of T1037, phage RNA polymerase (6vr4, 2166 amino acids), is homologous to the RNAi polymerase from Neurospora crassa (2j7n chain A, 934 amino acids)<ref name="6vr4" />. Dali aligns them over 564 residues with an RMSD of 4.8 A. 115 residues of the common core are in the T1037 substructure. Several long insertions in T1037/6vr4 relative to 2j7n (chain A) form subdomains, which point outwards from the common core. Similar massive adaptation of the common core is seen, for example, in the glucosyltransferase 1 family<ref>PMID: 7729407</ref>."


The [https://fatcat.godziklab.org/ FATCAT Server] reported that in order to superpose 150 residues (37% of 404) of T1037 with the closest structure in the PDB, 3 twists at hinges were required, after which an RMSD of 3.1 Å was achieved. For a 200-residue superposition (50% of 404), the best results after 3 twists had an RMSD of 5.4 Å.
The [https://fatcat.godziklab.org/ FATCAT Server] reported that in order to superpose 150 residues (37% of 404) of T1037 with the closest structure in the PDB, 3 twists at hinges were required, after which an RMSD of 3.1 Å was achieved. For a 200-residue superposition (50% of 404), the best results after 3 twists had an RMSD of 5.4 Å.
Line 192: Line 192:


For comparison, CASP 14 reported GDT_TS 86.96 for the AlphaFold2 prediction, while the AS2TS server calculated GDT_TS 85.87 vs. 7jx6 chain A, and 88.32 vs. 7JTL chain A. (These results were corrected for 90/92 and 91/92 residues, respectively.) Thus, there appears to be some unidentified minor discrepancy between the GDT_TS calculations of CASP-14 vs. the method detailed at [[Calculating GDT_TS]].
For comparison, CASP 14 reported GDT_TS 86.96 for the AlphaFold2 prediction, while the AS2TS server calculated GDT_TS 85.87 vs. 7jx6 chain A, and 88.32 vs. 7JTL chain A. (These results were corrected for 90/92 and 91/92 residues, respectively.) Thus, there appears to be some unidentified minor discrepancy between the GDT_TS calculations of CASP-14 vs. the method detailed at [[Calculating GDT_TS]].
==See Also==
*[[AlphaFold/Index]], a list of pages in Proteopedia about Alphafold.


==References & Notes==
==References & Notes==
<references />
<references />

Proteopedia Page Contributors and Editors (what is this?)Proteopedia Page Contributors and Editors (what is this?)

Eric Martz