Interpreting ConSurf Results: Difference between revisions
Eric Martz (talk | contribs) |
Eric Martz (talk | contribs) |
||
(17 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
This page discusses how to decide whether a [http://consurf.tau.ac.il ConSurf] result is optimal for the questions you wish to ask about a protein. It assumes that you already have one or more completed ConSurf results. For background principles and instructions on how to get a ConSurf result, please see [[ConSurf/Index]]. | This page discusses how to decide whether a [http://consurf.tau.ac.il ConSurf] result is optimal for the questions you wish to ask about a protein. It assumes that you already have one or more completed ConSurf results. For background principles and instructions on how to get a ConSurf result, please see [[ConSurf/Index]]. | ||
This page does not go into detail about the changes in settings needed to optimize a ConSurf results. The options for increasing (or decreasing) the diversity and number of sequences in the underlying multiple sequence alignment (MSA) are evident in the job submission forms of ConSurf. You are encouraged to try various options, and the information below will help you to decide which options give the most satisfactory result for your purposes. | |||
==Diversity in the MSA== | ==Diversity in the MSA== | ||
Line 12: | Line 14: | ||
The ''average pairwise distance'' (APD) in a multiple sequence alignment (MSA) is a measure of the evolutionary diversity in the sequences included. The APD is "The average number of replacements between any two sequences in the alignment; A distance of 0.01 means that on average, the expected replacement for every 100 positions is 1." (quoted from the ConSurf Server). | The ''average pairwise distance'' (APD) in a multiple sequence alignment (MSA) is a measure of the evolutionary diversity in the sequences included. The APD is "The average number of replacements between any two sequences in the alignment; A distance of 0.01 means that on average, the expected replacement for every 100 positions is 1." (quoted from the ConSurf Server). | ||
Generally, an APD of | Generally, an APD of 0.25 to 0.5 is consistent with an MSA whose sequences are limited to proteins with one specific function. As the APD approaches or exceeds 1.0, it is more likely that proteins of multiple functions are included in the MSA. | ||
===Example=== | ===Example=== | ||
Line 18: | Line 20: | ||
====APD 0.99==== | ====APD 0.99==== | ||
The APD for the 2VAA result in the Gallery is '''0.99'''. The MSA has 150 sequences, largely limited to sequences for major histocompatibility complex class I proteins. The labels of 101 sequences (67% of 150) contain "class I" or "class 1". There is only one class II protein sequence. Three sequences are labeled "zinc-alpha-2-glycoprotein", clearly a different function. There are 22 sequences labeled "uncharacterized protein" which nevertheless have high similarity to the query. 19 sequences are labeled "UPI000... related cluster". If the uncharacterized and "UPI000..." sequences are in fact class I sequences, then '''up to 142/150 (95%) of the sequences could be MHC-I'''. | The APD for the ConSurf Server 2VAA result with default settings in the Gallery is '''0.99'''. The MSA has 150 sequences, largely limited to sequences for major histocompatibility complex class I proteins. The labels of 101 sequences (67% of 150) contain "class I" or "class 1". There is only one class II protein sequence. Three sequences are labeled "zinc-alpha-2-glycoprotein", clearly a different function. There are 22 sequences labeled "uncharacterized protein" which nevertheless have high similarity to the query. 19 sequences are labeled "UPI000... related cluster". If the uncharacterized and "UPI000..." sequences are in fact class I sequences, then '''up to 142/150 (95%) of the sequences could be MHC-I'''. | ||
However, conservation of key functional residues was revealed only when custom ConSurf Server jobs achieved APD around 0.30: See Case #1 at[[ConSurfDB_vs._ConSurf#Examples]]. | |||
====APD 1.62==== | ====APD 1.62==== | ||
Line 28: | Line 32: | ||
<table style="background-color:#ffe0e0"><tr><td> | <table style="background-color:#ffe0e0"><tr><td> | ||
At the ConSurf Server results page, '''download the PDB file''' by opening ''High Resolution Figures and PDB Files'', and then clicking ''Download ConSurf PDB File for FirstGlance in Jmol''. Then [http://www.bioinformatics.org/firstglance/fgij/where.htm#u upload it to FirstGlance]. By downloading the PDB file, you will have it after the results '''disappear''' from the ConSurf Server. PDB files downloaded from the ConSurf Database (ConSurfDB) '''do not work''' in FirstGlance. | |||
</td></tr></table> | </td></tr></table> | ||
Line 60: | Line 65: | ||
[[Image:6vzx-collagen-250seq-APD0.74.png]] | [[Image:6vzx-collagen-250seq-APD0.74.png]] | ||
===Too | ===Too Many Residues With Insufficient Data=== | ||
Amino acids with insufficient data (uncertainty in conservation grade) are colored yellow. Here are two | Amino acids with insufficient data (uncertainty in conservation grade) are colored yellow. Here are two cases where yellow residues were a problem, with solutions. | ||
====More Sequences Needed==== | ====More Sequences Needed==== | ||
If a residue of interest has insufficient data, increasing the number of sequences in the MSA may give it a reliable conservation grade. This happened for sequence-identical chains C and F in [[1n73]]. With 150 sequences (default job settings), Lys401, participating in an isopeptide bond, had insufficient data. When 300 sequences were used in the MSA, Lys401 acquired a reliable conservation grade of 1. Its partner in the isopeptide bond, Gln397, dropped from conservation grade 8 to 7, although the APD did not increase. | If a residue of interest has insufficient data, increasing the number of sequences in the MSA may give it a reliable conservation grade. This happened for sequence-identical chains C and F in [[1n73]]. With 150 sequences (default job settings), Lys401, participating in an isopeptide bond, had insufficient data. When 300 sequences were used in the MSA, Lys401 acquired a reliable conservation grade of 1. Its partner in the isopeptide bond, Gln397, dropped from conservation grade 8 to 7, although the APD did not increase. | ||
< | |||
<blockquote> | |||
FirstGlance automatically reports six types of [[protein crosslinks]], including isopeptide bonds. Other examples of protein crosslinks colored by evolutionary conservation are [[FirstGlance/Evaluating_Protein_Crosslinks#Conservation_of_Crosslinking_Residues|a thioether crosslink in catalase]] and [[FirstGlance/Visualizing_Conservation#Conservation_of_Protein_Crosslinks|an isopeptide in poly-ubiquitin]]. | |||
</blockquote> | |||
[[Image:1n73-APD1.05.png]] | [[Image:1n73-APD1.05.png]] | ||
[[Image:1n73-isopeptide-conservation-yellow.png]] | [[Image:1n73-isopeptide-conservation-yellow.png]] | ||
Line 80: | Line 90: | ||
==See Also== | ==See Also== | ||
*[[ConSurf/Index]] | *[[ConSurf/Index]]: Links to explanations of the principles of evolutionary conservation, as well as practical guidance. | ||
*[[FirstGlance/Visualizing Conservation]]: Demonstrates the conveniences offered by FirstGlance for easily seeing conservation of salt bridges, cation-pi interactions, residues that bind ligand, substrate, or inhibitor, residues in covalent protein crosslinks, or any residues that you specify. |