Interpreting ConSurf Results: Difference between revisions
Jump to navigation
Jump to search
Eric Martz (talk | contribs) |
Eric Martz (talk | contribs) |
||
(2 intermediate revisions by the same user not shown) | |||
Line 14: | Line 14: | ||
The ''average pairwise distance'' (APD) in a multiple sequence alignment (MSA) is a measure of the evolutionary diversity in the sequences included. The APD is "The average number of replacements between any two sequences in the alignment; A distance of 0.01 means that on average, the expected replacement for every 100 positions is 1." (quoted from the ConSurf Server). | The ''average pairwise distance'' (APD) in a multiple sequence alignment (MSA) is a measure of the evolutionary diversity in the sequences included. The APD is "The average number of replacements between any two sequences in the alignment; A distance of 0.01 means that on average, the expected replacement for every 100 positions is 1." (quoted from the ConSurf Server). | ||
Generally, an APD of | Generally, an APD of 0.25 to 0.5 is consistent with an MSA whose sequences are limited to proteins with one specific function. As the APD approaches or exceeds 1.0, it is more likely that proteins of multiple functions are included in the MSA. | ||
===Example=== | ===Example=== | ||
Line 20: | Line 20: | ||
====APD 0.99==== | ====APD 0.99==== | ||
The APD for the 2VAA result in the Gallery is '''0.99'''. The MSA has 150 sequences, largely limited to sequences for major histocompatibility complex class I proteins. The labels of 101 sequences (67% of 150) contain "class I" or "class 1". There is only one class II protein sequence. Three sequences are labeled "zinc-alpha-2-glycoprotein", clearly a different function. There are 22 sequences labeled "uncharacterized protein" which nevertheless have high similarity to the query. 19 sequences are labeled "UPI000... related cluster". If the uncharacterized and "UPI000..." sequences are in fact class I sequences, then '''up to 142/150 (95%) of the sequences could be MHC-I'''. | The APD for the ConSurf Server 2VAA result with default settings in the Gallery is '''0.99'''. The MSA has 150 sequences, largely limited to sequences for major histocompatibility complex class I proteins. The labels of 101 sequences (67% of 150) contain "class I" or "class 1". There is only one class II protein sequence. Three sequences are labeled "zinc-alpha-2-glycoprotein", clearly a different function. There are 22 sequences labeled "uncharacterized protein" which nevertheless have high similarity to the query. 19 sequences are labeled "UPI000... related cluster". If the uncharacterized and "UPI000..." sequences are in fact class I sequences, then '''up to 142/150 (95%) of the sequences could be MHC-I'''. | ||
However, conservation of key functional residues was revealed only when custom ConSurf Server jobs achieved APD around 0.30: See Case #1 at[[ConSurfDB_vs._ConSurf#Examples]]. | |||
====APD 1.62==== | ====APD 1.62==== |