Conservation, Evolutionary: Difference between revisions

Wayne Decatur (talk | contribs)
Eric Martz (talk | contribs)
 
(14 intermediate revisions by the same user not shown)
Line 1: Line 1:
For a more basic explanation of this subject, please see [[Introduction to Evolutionary Conservation]].
For a more basic explanation of this subject, please see [[Introduction to Evolutionary Conservation]].


Mutations occur spontaneously in each generation, randomly changing the amino acid sequences of proteins. Individuals with mutations that impair critical functions of proteins may have resulting problems that make them less able to reproduce. Harmful mutations are lost from the gene pool because the individuals carrying them reproduce less effectively. Over time, only harmless (or very rare beneficial) mutations are maintained in the gene pool. This is [[Evolution|evolution]].
Mutations occur spontaneously in each generation, randomly changing the amino acid sequences of proteins. Individuals with mutations that impair critical functions of proteins may have resulting problems that make them less able to reproduce. Harmful mutations are lost from the gene pool because the individuals carrying them reproduce less effectively. Over time, only harmless (or very rare beneficial) mutations are maintained in the gene pool. This is [[Evolution|evolution]]. [[Introduction_to_Evolutionary_Conservation#Rett_Syndrome|Rett Syndrome is a stark illustration of these principles]].


When the sequences of a given protein are compared between [http://en.wikipedia.org/wiki/Taxa taxa], using multiple sequence alignment (MSA), differences between sequences most often represent mutations that were allowed (by evolution) to persist because they were harmless. Where the sequences are identical, we say that sequence was '''conserved'''. Such '''evolutionary conservation''' occurs because mutations of these amino acids were harmful to protein function, and were lost over time. Amino acids that are conserved are those most critical to the function of the protein. Thus, looking for evolutionarily conserved patches of amino acids in a 3D protein structure is a good way to '''locate functional sites'''.
When the sequences of a given protein are compared between [http://en.wikipedia.org/wiki/Taxa taxa], using multiple sequence alignment (MSA), differences between sequences most often represent mutations that were allowed (by evolution) to persist because they were harmless. Where the sequences are identical, we say that sequence was '''conserved'''. Such '''evolutionary conservation''' occurs because mutations of these amino acids were harmful to protein function, and were lost over time. Amino acids that are conserved are those most critical to the function of the protein. Thus, looking for evolutionarily conserved patches of amino acids in a 3D protein structure is a good way to '''locate functional sites'''. See the [[Introduction_to_Evolutionary_Conservation#Finding_Conservation|case study of enolase]], an enzyme in the glycolytic pathway.


Proteopedia's evolutionary conservation colors are pre-calculated by [[ConSurfDB vs. ConSurf|ConSurf-DB]].
Proteopedia's evolutionary conservation colors are pre-calculated by [[ConSurfDB vs. ConSurf|ConSurf-DB]].
Line 10: Line 10:
|-
|-
| {{Template:ColorKey_ConSurf}}
| {{Template:ColorKey_ConSurf}}
| The nine conservation grade colors utilized by ConSurf-DB and ConSurf, plus yellow for amino acids with insufficient data, and gray for chains that ConSurf could not process. See [[Help:Color Keys]].
| The nine conservation grade colors utilized by [[ConSurfDB vs. ConSurf|ConSurf-DB and the ConSurf Server]], plus yellow for amino acids with insufficient data, and gray for chains that ConSurf did not process. See [[Help:Color Keys]].
|-
|-
| colspan="2" |<br><ul><li>'''''Insufficient Data''''' describes amino acids for which a meaningful conservation level could not be derived from the set of homologous sequences utilized. This occurs when the confidence interval for the calculated conservation level is too large. For more, see the [[ConSurfDB_vs._ConSurf#ConSurf-DB_Process|ConSurfDB Process]].
| colspan="2" |<br><ul><li>'''''Insufficient Data''''' describes amino acids for which a meaningful conservation level could not be derived from the set of homologous sequences utilized. This occurs when the confidence interval for the calculated conservation level is too large. For more, see the [[ConSurfDB_vs._ConSurf#ConSurf-DB_Process|ConSurfDB Process]].
Line 16: Line 16:
</li>
</li>
<br>
<br>
<li>'''''No Data''''' describes entire protein chains that could not be processed by ConSurf-DB. For details, see [[ConSurfDB_vs._ConSurf#ConSurf-DB_Process|ConSurfDB Process]].
<li>'''''No Data''''' describes entire protein chains that were not or could not be processed by ConSurf-DB. For details, see [[ConSurfDB_vs._ConSurf#ConSurf-DB_Process|ConSurfDB Process]].
For an example, show ''Evolutionary Conservation'' at [[1hgf]].
For an example, show ''Evolutionary Conservation'' at [[1hgf]].
</li>
</li>
Line 26: Line 26:


==Locating Conserved Patches==
==Locating Conserved Patches==
Patches of highly conserved amino acid residues on the surface of a protein molecular structure are good candidates for [[Site | functional sites]]. Nearly every article in Proteopedia that is '''titled with a [[PDB code]]''' has an ''Evolutionary Conservation'' section below the molecular scene. (Results could not be obtained for a small
Patches of highly conserved amino acid residues on the surface of a protein molecular structure are good candidates for [[Site | functional sites]]. Many articles in Proteopedia that are '''titled with a [[PDB code]]''' have an ''Evolutionary Conservation'' section below the molecular scene. (Results could not be obtained for a small
percentage -- see [[ConSurfDB_vs._ConSurf#ConSurf-DB_Process|ConSurfDB Process]].) Clicking '''show''' in the blue ''Evolutionary Conservation'' bar automatically colors all chains in the molecule by evolutionary conservation as calculated by ConSurf-DB. To see some '''examples''', click on ''Random PDB entry'' in the ''random'' box at the upper left of every page in Proteopedia.
percentage -- see [[ConSurfDB_vs._ConSurf#ConSurf-DB_Process|ConSurfDB Process]].) Clicking '''show''' in the blue ''Evolutionary Conservation'' bar automatically colors all chains in the molecule by evolutionary conservation as calculated by ConSurf-DB. A typical example is [[Introduction_to_Evolutionary_Conservation#Finding_Conservation|conservation of the catalytic pocket of the enzyme enolase]]. For more '''examples''', click on ''Random PDB entry'' in the ''random'' box at the upper left of every page in Proteopedia.


Briefly, ConSurf-DB gathers sequences similar to that of the protein in question, then constructs a multiple sequence alignment, and analyses it for sequence positions that are conserved (have lower than average differences between sequences) and that are variable (have higher than average differences between sequences). Each amino acid is assigned a conservation score and corresponding color in Proteopedia's interactive 3D molecular scene.
Briefly, ConSurf-DB gathers sequences similar to that of the protein in question, then constructs a multiple sequence alignment, and analyses it for sequence positions that are conserved (have lower than average differences between sequences) and that are variable (have higher than average differences between sequences). Each amino acid is assigned a conservation score and corresponding color in Proteopedia's interactive 3D molecular scene.
Line 44: Line 44:


Charged residues are usually on the surfaces of folded proteins. If you see a highly conserved charged residue (''Arg, Asp, Glu, Lys''') on the surface, often it participates in a [[Salt bridges|salt bridge]]. Salt bridges help to stabilize protein folds, and hence the residues involved are often highly conserved. Example: Asp6 with Arg8 in [[1qdq]].
Charged residues are usually on the surfaces of folded proteins. If you see a highly conserved charged residue (''Arg, Asp, Glu, Lys''') on the surface, often it participates in a [[Salt bridges|salt bridge]]. Salt bridges help to stabilize protein folds, and hence the residues involved are often highly conserved. Example: Asp6 with Arg8 in [[1qdq]].
For other situations where conservation is expected, see [[Introduction_to_Evolutionary_Conservation#Expected_vs._Unexpected_Conservation|Expected vs. Unexpected Conservation]].


Remember that you can touch any residue with the mouse in the ''Evolutionary Conservation'' scene in Proteopedia (in Jmol), and its identity will be displayed after a few seconds. This works best with spinning turned off.
Remember that you can touch any residue with the mouse in the ''Evolutionary Conservation'' scene in Proteopedia (in Jmol), and its identity will be displayed after a few seconds. This works best with spinning turned off.
Line 58: Line 60:
This caveat applies only to molecules that contain chains with different sequences. The conservation colors shown in Proteopedia's ''Evolutionary Conservation'' scenes do not indicate the same levels of conservation for chains of different sequences. This is because  [http://consurfdb.tau.ac.il ConSurf-DB] calculates conservation levels independently for each sequence-different chain, and the levels are relative to the multiple sequence alignment constructed for each sequence-independent chain.
This caveat applies only to molecules that contain chains with different sequences. The conservation colors shown in Proteopedia's ''Evolutionary Conservation'' scenes do not indicate the same levels of conservation for chains of different sequences. This is because  [http://consurfdb.tau.ac.il ConSurf-DB] calculates conservation levels independently for each sequence-different chain, and the levels are relative to the multiple sequence alignment constructed for each sequence-independent chain.


For example, consider [[1bqh]], which contains 10 chains, representing two copies of a 5-chain molecule. Each molecule contains four sequence-different chains. A visit to [http://consurfdb.tau.ac.il ConSurf-DB] reveals, as expected, that a different number of sequences was utilized for the multiple sequence alignment (MSA) and conservation calculations for each of these sequence-different chains, and that each MSA had a different [[ConSurfDB_vs._ConSurf#Average_Pairwise_Distance|average pairwise difference (APD)]], a measure of diversity within the MSA. Therefore, residues with, for example, conservation level 9 (maximal conservation) in each of the three ConSurf-DB-colored sequence-different chains have the highest levels of conservation within their own chain, but do not have exactly the same absolute levels of conservation.
For example, consider [[1bqh]] (a [https://www.youtube.com/watch?v=2ZakngfbHSo Major Histocompatibility Class I] protein), which contains 5 chains with four distinct sequences. A visit to [http://consurfdb.tau.ac.il ConSurf-DB] reveals, as expected, that a different number of sequences was utilized for the multiple sequence alignment (MSA) and conservation calculations for each of these sequence-different chains, and that each MSA had a different [[ConSurfDB_vs._ConSurf#Average_Pairwise_Distance|average pairwise difference (APD)]], a measure of diversity within the MSA. Therefore, residues with, for example, conservation level 9 (maximal conservation) in each of the three ConSurf-DB-colored sequence-different chains have the highest levels of conservation within their own chain, but do not have exactly the same absolute levels of conservation.


<center>
<center>
Line 105: Line 107:
===INTREPID===
===INTREPID===


In 2024, the INTREPID Server, formerly at the University of California, Berkeley, appears to be unavailable.
<!--
&quot;[http://phylogenomics.berkeley.edu/INTREPID/index.html INTREPID] is an information-theoretic approach for functional site identification that exploits the information in large diverse multiple sequence alignments. INTREPID gathers homologs for a sequence using PSI-BLAST and estimates a phylogenetic tree. It then uses Jensen-Shannon divergence to measure the information for each position in the sequence at each subtree node encountered on a traversal of the phylogeny, tracing a path from the root to the leaf corresponding to the sequence of interest. Positions that are conserved across the entire family receive stronger scores than those that only become conserved within more closely related subgroups. This tree traversal produces a phylogenomic conservation score for each position in the MSA. INTREPID uses information from sequence only, and can thus be used when knowledge of structure is not available.&quot; (Quoted from the [http://phylogenomics.berkeley.edu/INTREPID/index.html INTREPID website].)
&quot;[http://phylogenomics.berkeley.edu/INTREPID/index.html INTREPID] is an information-theoretic approach for functional site identification that exploits the information in large diverse multiple sequence alignments. INTREPID gathers homologs for a sequence using PSI-BLAST and estimates a phylogenetic tree. It then uses Jensen-Shannon divergence to measure the information for each position in the sequence at each subtree node encountered on a traversal of the phylogeny, tracing a path from the root to the leaf corresponding to the sequence of interest. Positions that are conserved across the entire family receive stronger scores than those that only become conserved within more closely related subgroups. This tree traversal produces a phylogenomic conservation score for each position in the MSA. INTREPID uses information from sequence only, and can thus be used when knowledge of structure is not available.&quot; (Quoted from the [http://phylogenomics.berkeley.edu/INTREPID/index.html INTREPID website].)


Line 111: Line 115:
Evidence is provided that INTREPID out-performs ConSurf for predicting catalytic residues.
Evidence is provided that INTREPID out-performs ConSurf for predicting catalytic residues.


Unlike ConSurf, INTREPID does not identify the [[#Locating Variable Patches|most variable residues]] in addition to the [[#Locating Conserved Patches|most conserved]].
Unlike ConSurf, INTREPID does not identify the [[#Locating Variable Patches|most variable residues]] in addition to the [[#Locating Conserved Patches|most conserved]]. -->


===xProtCAS===
===xProtCAS===
Line 120: Line 124:


===siteFiNDER|3D===
===siteFiNDER|3D===
In 2024, the siteFiNDER 3D Server, formerly at Yale University, appears to be unavailable. <!--


[http://sitefinder3d.mbb.yale.edu/ siteFiNDER|3D] performs ''conserved functional group'' (CFG) analysis. "CFG Analysis is a general method for predicting the location of functionally important sites within a target protein structure. Like other available structure/sequence analysis techniques, CFG Analysis exploits the evolutionary relationships present across groups of homologous proteins to identify regions that are likely to be of functional significance. However, this technique is particularly useful for situations where other methods fail, for instance when only a few or highly similar homologues can be identified." As its name implies, CFG analysis attempts to identify groups of conserved amino acids that together represent a functional site. In this respect, it goes beyond most other evolutionary conservation servers, which stop at assigning a conservation value to each amino acid. See the [http://consurfdb.tau.ac.il/comparison.php comparison of siteFiNDER|3D with ConSurf for cytochrome c].
[http://sitefinder3d.mbb.yale.edu/ siteFiNDER|3D] performs ''conserved functional group'' (CFG) analysis. "CFG Analysis is a general method for predicting the location of functionally important sites within a target protein structure. Like other available structure/sequence analysis techniques, CFG Analysis exploits the evolutionary relationships present across groups of homologous proteins to identify regions that are likely to be of functional significance. However, this technique is particularly useful for situations where other methods fail, for instance when only a few or highly similar homologues can be identified." As its name implies, CFG analysis attempts to identify groups of conserved amino acids that together represent a functional site. In this respect, it goes beyond most other evolutionary conservation servers, which stop at assigning a conservation value to each amino acid. See the [http://consurfdb.tau.ac.il/comparison.php comparison of siteFiNDER|3D with ConSurf for cytochrome c].


This site provides links to several other software packages that predict functional sites, some of which are not further discussed in the present article.
This site provides links to several other software packages that predict functional sites, some of which are not further discussed in the present article. -->


===HotPatch===
===HotPatch===


[http://hotpatch.mbi.ucla.edu/ HotPatch] <ref>PMID: 17451744</ref> "finds unusual patches on the surface of proteins, and computes just how unusual they are (patch rareness), and how likely each patch is to be of functional importance (functional confidence (FC).) The statistical analysis is done by comparing your protein's surface against the surfaces of a large set of proteins whose functional sites are known." One advantage of HotPatch is that sequence homologs are not required. See the [http://consurfdb.tau.ac.il/comparison.php comparison of HotPatch with ConSurf for cytochrome c].
In 2024, the HotPatch Server, formerly at UCLA, appears to be unavailable. <!--
[http://hotpatch.mbi.ucla.edu/ HotPatch] <ref>PMID: 17451744</ref> "finds unusual patches on the surface of proteins, and computes just how unusual they are (patch rareness), and how likely each patch is to be of functional importance (functional confidence (FC).) The statistical analysis is done by comparing your protein's surface against the surfaces of a large set of proteins whose functional sites are known." One advantage of HotPatch is that sequence homologs are not required. See the [http://consurfdb.tau.ac.il/comparison.php comparison of HotPatch with ConSurf for cytochrome c]. -->


===Evolutionary Trace Viewer===
===Evolutionary Trace Viewer===


[http://mammoth.bcm.tmc.edu/traceview/index.html Evolutionary Trace Viewer] (ETV). See the [http://consurfdb.tau.ac.il/comparison.php comparison of ETV with ConSurf for cytochrome c].
[http://evolution.lichtargelab.org/ETviewer Evolutionary Trace Viewer] (ETV).<!--
 
See the [http://consurfdb.tau.ac.il/comparison.php comparison of ETV with ConSurf for cytochrome c].-->
<blockquote>
<blockquote>
Comment by [[User:Eric Martz]], March, 2009: From the information provided on the ETV website, I found it quite difficult to understand what the ETV is doing, or how to use the viewer. An explanation in simple terms for non-specialists would be very useful.
Comment by [[User:Eric Martz]], March, 2009: From the information provided on the ETV website, I found it quite difficult to understand what the ETV is doing, or how to use the viewer. An explanation in simple terms for non-specialists would be very useful.
</blockquote>
</blockquote>


===EVcouplings / EVfold===
===EVcouplings / EVfold===
Line 148: Line 154:
*[[ConSurfDB vs. ConSurf]]: How the servers work and how to get optimal results from ConSurf.
*[[ConSurfDB vs. ConSurf]]: How the servers work and how to get optimal results from ConSurf.
*[[Help:How to Insert a ConSurf Result Into a Proteopedia Green Link]]
*[[Help:How to Insert a ConSurf Result Into a Proteopedia Green Link]]
*[[ConSurf/Index]] lists all ConSurf-related pages in Proteopedia.


==References==
==References==
<references />
<references />

Proteopedia Page Contributors and Editors (what is this?)Proteopedia Page Contributors and Editors (what is this?)

Eric Martz, Eran Hodis, Wayne Decatur