Calculating GDT TS: Difference between revisions

Eric Martz (talk | contribs)
Eric Martz (talk | contribs)
 
(34 intermediate revisions by the same user not shown)
Line 1: Line 1:
<span class="text-red">'''This page is under construction.''' [[User:Eric Martz|Eric Martz]] 01:18, 5 March 2021 (UTC)</span>
==What is GDT_TS?==
==What is GDT_TS?==
The ''Global Distance Test - Total Score'' (GDT_TS)<ref name="gdtcasp">[https://predictioncenter.org/casp13/doc/LCS_GDT.README GDT description] at the CASP website.</ref><ref name="gdtwikipedia">[https://en.wikipedia.org/wiki/Global_distance_test Global distance test] at Wikipedia.</ref> is used to quantitate the similarity between a predicted protein structure, and a reference structure, which is typically an [[empirical model]]. GDT_TS gives an overall average measure of how close each amino acid in the predicted model is to those in the empirical model, taking into account many different superpositions of the two models. When the two structures differ in detail, GDT_TS is better at detecting similarities in fold than is the ''Root Mean Square Deviation''. "RMSD uses the actual distances between alpha carbons, where GDT works with the percentage of alpha carbons that are found within certain cutoff distances of each other."<ref name="gdtfoldit">[https://foldit.fandom.com/wiki/GDT GDT in the Foldit Wiki].</ref> Both tests compare the positions of only the alpha carbon atoms. GDT_TS values range from 0 (a meaningless prediction) to 100 (a perfect prediction). "Random predictions give around 20; getting the gross topology right gets one to ~50; accurate topology is usually around 70; and when all the little bits and pieces, including side-chain conformations, are correct, GDT_TS begins to climb above 90."<ref name="alquraishi">[https://moalquraishi.wordpress.com/2020/12/08/alphafold2-casp14-it-feels-like-ones-child-has-left-home/ AlphaFold2 @ CASP14: “It feels like one’s child has left home.”] by Mohammed AlQuraishi, December 8, 2020.</ref>.
The ''Global Distance Test - Total Score'' (GDT_TS)<ref name="zemla" /><ref name="gdtcaspshort">[https://predictioncenter.org/casp14/doc/help.html#GDT_TS GDT_TS definition] at the CASP 14 website.</ref><ref name="gdtcasp">[https://predictioncenter.org/casp13/doc/LCS_GDT.README GDT description] at the CASP website.</ref><ref name="gdtwikipedia">[https://en.wikipedia.org/wiki/Global_distance_test Global distance test] at Wikipedia.</ref> is used to quantitate the similarity between a predicted protein structure (or any query protein structure), and a reference structure, which is typically an [[empirical model]]. The sequences of the two structures need not be the same. GDT_TS gives an overall average measure of how close each amino acid in the predicted model is to those in the empirical model, taking into account many different superpositions of the two models. When the two structures differ in detail, GDT_TS is better at detecting similarities in fold than is the [[RMSD|Root Mean Square Deviation]]. "RMSD uses the actual distances between alpha carbons, where GDT works with the percentage of alpha carbons that are found within certain cutoff distances of each other."<ref name="gdtfoldit">[https://foldit.fandom.com/wiki/GDT GDT in the Foldit Wiki].</ref> Both tests compare the positions of only the alpha carbon atoms. GDT_TS values range from 0 (a meaningless prediction) to 100 (a perfect prediction). "Random predictions give around 20; getting the gross topology right gets one to ~50; accurate topology is usually around 70; and when all the little bits and pieces, including side-chain conformations, are correct, GDT_TS begins to climb above 90."<ref name="alquraishi">[https://moalquraishi.wordpress.com/2020/12/08/alphafold2-casp14-it-feels-like-ones-child-has-left-home/ AlphaFold2 @ CASP14: “It feels like one’s child has left home.”] by Mohammed AlQuraishi, December 8, 2020.</ref>.


Results of predictions submitted to the [[CASP|biannual CASP competitions]] are judged largely by GDT_TS. Proteopedia pages using GDT_TS include [[Theoretical models]] and [[AlphaFold2 examples from CASP 14]].
The accuracies of predictions submitted to the [[CASP|biannual CASP competitions]] are judged largely by GDT_TS. Proteopedia pages using GDT_TS include [[Theoretical models]] and [[AlphaFold2 examples from CASP 14]].


==Server for calculating GDT_TS==
==Server for calculating GDT_TS==


GDT_TS can be calculated with the free [http://linum.proteinmodel.org/ AS2TS Server] provided by [http://linum.proteinmodel.org/AS2TS/people/Zemla/ Adam Zemla]<ref name="zemla">PMID: 12824330</ref><ref name="as2ts">PMID: 15980437</ref>. Below are detailed instructions kindly provided by Zemla.
GDT_TS can be calculated with the [http://linum.proteinmodel.org/ AS2TS Server] provided by [http://linum.proteinmodel.org/AS2TS/people/Zemla/ Adam Zemla]<ref name="zemla">PMID: 12824330</ref><ref name="as2ts">PMID: 15980437</ref>. Below are detailed instructions kindly provided by Zemla in March, 2021.


===Run 1: Superposition===
===Run 1: Superposition===
Line 25: Line 23:
5. Caveat: If you let your browser auto-fill the email address slot, make sure to clear any other slots that got auto-filled inadvertently. Otherwise you may get an error message.
5. Caveat: If you let your browser auto-fill the email address slot, make sure to clear any other slots that got auto-filled inadvertently. Otherwise you may get an error message.


6. Leaving the parameters at their defaults, press the '''START''' button.
6. Add at the end of the default parameters "-d:4.0". Thus you will submit your job with these parameters:
:'''-4 -o2 -gdc -lga_m -stral -d:4.0'''
:(Without this, LGA defaults to 5.0 Å, but CASP uses 4.0.)
 
7. Press the '''START''' button.


:In the results for Run 1, you may be interested in the '''RMSD''' and '''Seq_Id''' for the superposition deemed optimal by this server.
In the results for Run 1, you may be interested in the '''RMSD''' and '''Seq_Id''' for the superposition deemed optimal by this server.


The '''LGA_S''' value is a structure similarity score for the number of alpha carbons given under '''N'''.  
The '''LGA_S''' value (range 0 to 100) is a structure similarity score for the number of alpha carbons given under '''N'''. LGA_S values below ~40 indicate that the two structures have different folds. The LGA_S score for our example is 49.46 for 87 alpha carbons, indicating similar folds.


:Caveat: If the first structure is half the length of the second '''reference''' structure, then the maximum possible LGA score is 50%. On the other hand, if the second structure is half the length of the first one, then the maximum possible LGA score is 100%. In our example, the length of 5a2f_A is 218 amino acids, and the length of the reference structure 7jx6_A is 104. Therefore, a score of 100 is not impossible.
:Caveat: If the first structure is half the length of the second '''reference''' structure, then the maximum possible LGA score is 50%. On the other hand, if the second structure is half the length of the first one, then the maximum possible LGA score is 100%. In our example, the length of 5a2f_A is 218 amino acids, and the length of the reference structure 7jx6_A is 104. Therefore, a score of 100 is not impossible.
===Run 2: GDT_TS===
Run 2 uses the superposition determined in Run 1.
8. Copy the entire output of Run 1 to the clipboard.
9. In a separate tab, get the same form [http://linum.proteinmodel.org/AS2TS/LGA/lga.html LGA = pairwise protein structure comparison].
10. Make sure there are no molecules specified in sections <span class="text-red">'''1, 2, or 3'''</span> of the form. If necessary, press the '''Clear Form''' button. ''If you leave molecules from the previous run in sections <span class="text-red">'''1, 2, or 3'''</span>, what you paste into <span class="text-red">'''Box 4'''</span> will be ignored!''
11. Enter your email address.
12. Paste the entire output of Run 1 into box <span class="text-red">'''4'''</span>.
13. In the parameters slot, change -4 to '''-3''', and add '''-d:4.0 &nbsp;-al''' at the end. So the complete parameters for Run 2 should be
:'''-3 -o2 -gdc -lga_m -stral &nbsp;-d:4.0 &nbsp;-al'''.
14. Press the '''START''' button. The output for our example:
[[Image:As2ts-5a2fA-7jx6A-run2.png|border|800px]]
15. The GDT_TS score reported by the server needs to be adjusted to reflect the similarity for the entire reference structure, namely 104 residues in our example. The final GDT_TS score for structure similarity between 5a2f_A and 7jx6_A can be estimated as follows:
:<span style="font-size:120%;">'''GDT_TS = 63.068 * 88/104 = <span class="text-red">53.37</span>'''<span>
However, the CASP 14 target for ORF8 had only 92 residues<ref name="casp14domains">[https://predictioncenter.org/casp14/domains_summary.cgi CASP 14 Domain Definitions and Classifications].</ref>. Therefore, to calculate GDT_TS values for comparision with CASP 14 ORF8 prediction values, the correction denominator should be 92 instead of 104:
:<span style="font-size:120%;">'''GDT_TS = 63.068 * 88/92 = <span class="text-red">60.33</span>'''<span>
==See Also==
*[[AlphaFold/Index]], a list of pages in Proteopedia about Alphafold.


==References==
==References==
<references />
<references />

Proteopedia Page Contributors and Editors (what is this?)Proteopedia Page Contributors and Editors (what is this?)

Eric Martz