Conservation, Evolutionary: Difference between revisions

From Proteopedia
Jump to navigation Jump to search
Eric Martz (talk | contribs)
Eric Martz (talk | contribs)
 
(46 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Mutations occur spontaneously in each generation, randomly changing the amino acid sequences of proteins. Individuals with mutations that impair critical functions of proteins may have resulting problems that make them less able to reproduce. Harmful mutations are lost from the gene pool because the individuals carrying them reproduce less effectively. Over time, only harmless (or very rare beneficial) mutations are maintained in the gene pool. This is [[Evolution|evolution]].
For a more basic explanation of this subject, please see [[Introduction to Evolutionary Conservation]].


When the sequences of a given protein are compared between [http://en.wikipedia.org/wiki/Taxa taxa], using multiple sequence alignment (MSA), differences between sequences most often represent mutations that were allowed (by evolution) to persist because they were harmless. Where the sequences are identical, we say that sequence was '''conserved'''. Such '''evolutionary conservation''' occurs because mutations of these amino acids were harmful to protein function, and were lost over time. Amino acids that are conserved are those most critical to the function of the protein. Thus, looking for evolutionarily conserved patches of amino acids in a 3D protein structure is a good way to '''locate functional sites'''.
Mutations occur spontaneously in each generation, randomly changing the amino acid sequences of proteins. Individuals with mutations that impair critical functions of proteins may have resulting problems that make them less able to reproduce. Harmful mutations are lost from the gene pool because the individuals carrying them reproduce less effectively. Over time, only harmless (or very rare beneficial) mutations are maintained in the gene pool. This is [[Evolution|evolution]]. [[Introduction_to_Evolutionary_Conservation#Rett_Syndrome|Rett Syndrome is a stark illustration of these principles]].
 
When the sequences of a given protein are compared between [http://en.wikipedia.org/wiki/Taxa taxa], using multiple sequence alignment (MSA), differences between sequences most often represent mutations that were allowed (by evolution) to persist because they were harmless. Where the sequences are identical, we say that sequence was '''conserved'''. Such '''evolutionary conservation''' occurs because mutations of these amino acids were harmful to protein function, and were lost over time. Amino acids that are conserved are those most critical to the function of the protein. Thus, looking for evolutionarily conserved patches of amino acids in a 3D protein structure is a good way to '''locate functional sites'''. See the [[Introduction_to_Evolutionary_Conservation#Finding_Conservation|case study of enolase]], an enzyme in the glycolytic pathway.
 
Proteopedia's evolutionary conservation colors are pre-calculated by [[ConSurfDB vs. ConSurf|ConSurf-DB]].


{| class="wikitable" width="600" align="right"
{| class="wikitable" width="600" align="right"
|-
|-
| {{Template:ColorKey_ConSurf}}
| {{Template:ColorKey_ConSurf}}
| The nine conservation grade colors utilized by ConSurf-DB and ConSurf, plus yellow for amino acids with insufficient data, and gray for chains that ConSurf could not process. See [[Help:Color Keys]].
| The nine conservation grade colors utilized by [[ConSurfDB vs. ConSurf|ConSurf-DB and the ConSurf Server]], plus yellow for amino acids with insufficient data, and gray for chains that ConSurf did not process. See [[Help:Color Keys]].
|-
|-
| colspan="2" |<br><ul><li>'''''Insufficient Data''''' describes amino acids for which a meaningful conservation level could not be derived from the set of homologous sequences utilized. This occurs when the confidence interval for the calculated conservation level is too large. For more, see the [[#ConSurf-DB Process|ConSurf-DB Process]].
| colspan="2" |<br><ul><li>'''''Insufficient Data''''' describes amino acids for which a meaningful conservation level could not be derived from the set of homologous sequences utilized. This occurs when the confidence interval for the calculated conservation level is too large. For more, see the [[ConSurfDB_vs._ConSurf#ConSurf-DB_Process|ConSurfDB Process]].
For an example, show ''Evolutionary Conservation'' at [[1hgf]].
For an example, show ''Evolutionary Conservation'' at [[1hgf]].
</li>
</li>
<br>
<br>
<li>'''''No Data''''' describes entire protein chains that could not be processed by ConSurf-DB. For details, see [[#ConSurf-DB Process|ConSurf-DB Process]].
<li>'''''No Data''''' describes entire protein chains that were not or could not be processed by ConSurf-DB. For details, see [[ConSurfDB_vs._ConSurf#ConSurf-DB_Process|ConSurfDB Process]].
For an example, show ''Evolutionary Conservation'' at [[1hgf]].
For an example, show ''Evolutionary Conservation'' at [[1hgf]].
</li>
</li>
Line 22: Line 26:


==Locating Conserved Patches==
==Locating Conserved Patches==
Patches of highly conserved amino acid residues on the surface of a protein molecular structure are good candidates for [[Site | functional sites]]. Every article in Proteopedia that is '''titled with a [[PDB code]]''' has an ''Evolutionary Conservation'' section below the molecular scene. Clicking '''show''' in the blue ''Evolutionary Conservation'' bar automatically colors all chains in the molecule by evolutionary conservation as calculated by ConSurf-DB.
Patches of highly conserved amino acid residues on the surface of a protein molecular structure are good candidates for [[Site | functional sites]]. Many articles in Proteopedia that are '''titled with a [[PDB code]]''' have an ''Evolutionary Conservation'' section below the molecular scene. (Results could not be obtained for a small
percentage -- see [[ConSurfDB_vs._ConSurf#ConSurf-DB_Process|ConSurfDB Process]].) Clicking '''show''' in the blue ''Evolutionary Conservation'' bar automatically colors all chains in the molecule by evolutionary conservation as calculated by ConSurf-DB. A typical example is [[Introduction_to_Evolutionary_Conservation#Finding_Conservation|conservation of the catalytic pocket of the enzyme enolase]]. For more '''examples''', click on ''Random PDB entry'' in the ''random'' box at the upper left of every page in Proteopedia.


Briefly, ConSurf-DB gathers sequences similar to that of the protein in question, then constructs a multiple sequence alignment, and analyses it for sequence positions that are conserved (have lower than average differences between sequences) and that are variable (have higher than average differences between sequences). Each amino acid is assigned a conservation score and corresponding color in Proteopedia's interactive 3D molecular scene.
Briefly, ConSurf-DB gathers sequences similar to that of the protein in question, then constructs a multiple sequence alignment, and analyses it for sequence positions that are conserved (have lower than average differences between sequences) and that are variable (have higher than average differences between sequences). Each amino acid is assigned a conservation score and corresponding color in Proteopedia's interactive 3D molecular scene.


ConSurf-DB's analysis is done with sophisticated, published, peer-reviewed, state of the art methods. A more detailed overview of the [[#The ConSurf-DB Mechanism|mechanism employed by ConSurf-DB is summarized below]]. Proteopedia's built-in display of ConSurf-DB results is a good place to start looking for conserved patches.
ConSurf-DB's analysis is done with sophisticated, published, peer-reviewed, state of the art methods. A more detailed overview of the [[ConSurfDB_vs._ConSurf#ConSurf-DB_Process|process employed by ConSurf-DB]] is available. Proteopedia's built-in display of ConSurf-DB results is a good place to start looking for conserved patches.
 
However, as explained [[#ConSurf-DB Usually Hides Some Functional Sites|below]], ConSurf-DB usually does not show all the conserved patches present in proteins with the same function. Therefore, you may wish to extend your analysis of conservation by limiting the analysis to proteins of one function, using the ConSurf Server, as explained [[#Limiting Conservation Analysis to Proteins of a Single Function|below]]. The results of such an analysis can be displayed in a molecular scene in Proteopedia. See below for [[#Examples|Examples]] and [[#How to Insert a ConSurf Result Into a Proteopedia Green Link|Instructions]].


[[Topic pages]] in Proteopedia (manually-authored pages that typically discuss more than one [[PDB code]]) may include molecular scenes colored by evolutionary conservation. See below for [[#Examples|Examples]] and [[#How to Insert a ConSurf Result Into a Proteopedia Green Link|Instructions]].
However, [[#ConSurf-DB_Often_Obscures_Some_Functional_Sites|ConSurf-DB usually does not show all the conserved patches present in proteins with the same function]]. Therefore, you may wish to extend your analysis of conservation by using the ConSurf Server to [[ConSurfDB_vs._ConSurf#Limiting_ConSurf_Analysis_to_Proteins_of_a_Single_Function|limit the analysis to proteins of one function]]. The results of such an analysis can be displayed in a molecular scene in Proteopedia. See [[Help:How to Insert a ConSurf Result Into a Proteopedia Green Link]].


==Locating Variable Patches==
==Locating Variable Patches==
In some cases, patches of highly variable (rapidly mutating) residues are also functional sites. These can also be identified with Proteopedia's ''Evolutionary Conservation'' scenes. For example, mutations in influenza hemagglutinin help the virus to evade host defenses (see [[1hgf]]). Another example is the high allelic variability of the peptide-binding groove of [[Major Histocompatibility Complex Class I]]. That variability helps the grooves of the alleles within any individual to bind a wide range of peptides, hence enabling the T lymphocyte system to defend against a wide range of pathogens, including influenza virus. See the ConSurf-colored [[#Examples|example]] below.
In some cases, patches of highly variable (rapidly mutating) residues are also functional sites. These can also be identified preliminarily with Proteopedia's ''Evolutionary Conservation'' scenes from [[ConSurfDB vs. ConSurf|ConSurfDB]], and more definitively with conservation analysis [[ConSurfDB_vs._ConSurf#Limiting_ConSurf_Analysis_to_Proteins_of_a_Single_Function|limited to proteins of a single function]]. For example, mutations in influenza hemagglutinin help the virus to evade host defenses (see [[1hgf]]). Another example is the high allelic variability of the peptide-binding groove of [[Major Histocompatibility Complex Class I]]. That variability helps the grooves of the alleles within any individual to bind a wide range of peptides, hence enabling the T lymphocyte system to defend against a wide range of pathogens, including influenza virus.


==Conservation for Domain Folding==
==Conservation for Domain Folding==
Certain residues on the surfaces of protein molecules tend to be conserved in order to maintain proper folding, rather than because they are part of a site functioning to interact with substrate, ligand, or a protein partner. Secondary structure elements need to break at the protein molecular surface in order to turn back into the folded protein domain. Therefore, it is common to see isolated highly conserved residues that enable turns, or break helices, notably '''glycines or prolines''', on protein structure surfaces.
Certain residues on the surfaces of protein molecules tend to be conserved in order to maintain proper folding, rather than because they are part of a site functioning to interact with substrate, ligand, or a protein partner. Secondary structure elements need to break at the protein molecular surface in order to turn back into the folded protein domain. Therefore, it is common to see isolated highly conserved residues that enable turns, or break helices, notably '''glycines or prolines''', on protein structure surfaces.


Remember that you can touch any residue with the mouse in the ''Evolutionary Conservation'' scene in Proteopedia (in Jmol), and its identity will be displayed after a few seconds. This works best with spinning turned off.
Cysteines that form [[Introduction_to_molecular_visualization#Disulfide_Bonds|disulfide bridges]] are typically conserved, as are other amino acids that form rare [[protein crosslinks]].
 
Charged residues are usually on the surfaces of folded proteins. If you see a highly conserved charged residue (''Arg, Asp, Glu, Lys''') on the surface, often it participates in a [[Salt bridges|salt bridge]]. Salt bridges help to stabilize protein folds, and hence the residues involved are often highly conserved. Example: Asp6 with Arg8 in [[1qdq]].


Every structure in ''Proteopedia'' has a link to be displayed in [http://firstglance.jmol.org FirstGlance in Jmol]. There, you can use the ''Find'' dialog to enter the name of an amino acid, e.g. ''glycine'' or ''proline'', and the positions of all of the specified amino acids will be highlighted. You can then visualize their distribution in the 3D structure.
For other situations where conservation is expected, see [[Introduction_to_Evolutionary_Conservation#Expected_vs._Unexpected_Conservation|Expected vs. Unexpected Conservation]].


Remember that you can touch any residue with the mouse in the ''Evolutionary Conservation'' scene in Proteopedia (in Jmol), and its identity will be displayed after a few seconds. This works best with spinning turned off.


Every structure in ''Proteopedia'' has a link to be displayed in [http://firstglance.jmol.org FirstGlance in Jmol]. There, you can use the ''Find'' dialog to enter the name of an amino acid, e.g. ''glycine'' or ''proline'', and the positions of all of the specified amino acids will be highlighted. You can then visualize their distribution in the 3D structure. This strategy can also be utilized when viewing the protein colored by conservation, using the FirstGlance links in [[ConSurfDB_vs._ConSurf|either ConSurf server]].


==Caveats==
==Caveats==
===ConSurf-DB Often Obscures Some Functional Sites===
===ConSurf-DB Often Obscures Some Functional Sites===


Proteopedia's ''Evolutionary Conservation'' scenes use pre-calculated results from [[#The ConSurf-DB Mechanism|ConSurf-DB]]. ConSurf-DB is designed to include a wide range of sequences in its multiple-sequence alignments (MSA) and analyses. Often, the MSA will a include substantial number of sequences for proteins with '''different functions''' than the query protein. (See [[#Examining Functions of Proteins in ConSurf-DB's MSA|below]] for how to find out the functions of the proteins used in ConSurf-DB's MSA.) Consequently, amino acids that are colored as highly conserved by ConSurf-DB are truly highly conserved across a wide range of sequence-similar proteins. However, amino acids that are '''highly conserved in proteins with the same function''' as the query protein '''may not appear conserved''' in ConSurf-DB results. A good way to find these obscured functional sites is to do a conservation analysis that is limited to proteins of a single function.
Proteopedia's ''Evolutionary Conservation'' scenes use pre-calculated results from [[ConSurfDB vs. ConSurf|ConSurf-DB]]. ConSurf-DB is designed to include a wide range of sequences in its multiple-sequence alignments (MSA) and analyses. Often, the MSA will a include substantial number of sequences for proteins with '''different functions''' than the query protein. (See [[ConSurfDB_vs._ConSurf#Examining_Functions_of_Proteins_in_ConSurf-DB.27s_MSA|these instructions]] for how to find out the functions of the proteins used in ConSurf-DB's MSA.) Consequently, amino acids that are colored as highly conserved by ConSurf-DB are truly highly conserved across a wide range of sequence-similar proteins. However, amino acids that are '''highly conserved in proteins with the same function''' as the query protein '''may not appear conserved''' in ConSurf-DB results. A good way to find these obscured functional sites is to do a conservation analysis that is limited to proteins of a single function.
[[#Limiting Conservation Analysis to Proteins of a Single Function|See below for instructions.]]
See [[ConSurfDB_vs._ConSurf#Limiting_ConSurf_Analysis_to_Proteins_of_a_Single_Function|Limiting ConSurf Analysis to Proteins of a Single Function]].


===Use Caution When Comparing Conservation of Sequence-Different Chains===
===Use Caution When Comparing Conservation of Sequence-Different Chains===
This caveat applies only to molecules that contain chains with different sequences. The conservation colors shown in Proteopedia's ''Evolutionary Conservation'' scenes do not indicate the same levels of conservation for chains of different sequences. This is because  [http://consurfdb.tau.ac.il ConSurf-DB] calculates conservation levels independently for each sequence-different chain, and the levels are relative to the multiple sequence alignment constructed for each sequence-independent chain.
This caveat applies only to molecules that contain chains with different sequences. The conservation colors shown in Proteopedia's ''Evolutionary Conservation'' scenes do not indicate the same levels of conservation for chains of different sequences. This is because  [http://consurfdb.tau.ac.il ConSurf-DB] calculates conservation levels independently for each sequence-different chain, and the levels are relative to the multiple sequence alignment constructed for each sequence-independent chain.


For example, consider [[1bqh]], which contains 10 chains, representing two copies of a 5-chain molecule. Each molecule contains four sequence-different chains. A visit to [http://consurfdb.tau.ac.il ConSurf-DB] reveals, as expected, that a different number of sequences was utilized for the multiple sequence alignment (MSA) and conservation calculations for each of these sequence-different chains, and that each MSA had a different average pairwise difference (APD), a measure of diversity within the MSA. Therefore, residues with, for example, conservation level 9 (maximal conservation) in each of the three ConSurf-DB-colored sequence-different chains have the highest levels of conservation within their own chain, but do not have exactly the same absolute levels of conservation.
For example, consider [[1bqh]] (a [https://www.youtube.com/watch?v=2ZakngfbHSo Major Histocompatibility Class I] protein), which contains 5 chains with four distinct sequences. A visit to [http://consurfdb.tau.ac.il ConSurf-DB] reveals, as expected, that a different number of sequences was utilized for the multiple sequence alignment (MSA) and conservation calculations for each of these sequence-different chains, and that each MSA had a different [[ConSurfDB_vs._ConSurf#Average_Pairwise_Distance|average pairwise difference (APD)]], a measure of diversity within the MSA. Therefore, residues with, for example, conservation level 9 (maximal conservation) in each of the three ConSurf-DB-colored sequence-different chains have the highest levels of conservation within their own chain, but do not have exactly the same absolute levels of conservation.


<center>
<center>
Line 90: Line 97:
===Conservation Results Will Change With Time===
===Conservation Results Will Change With Time===


Slight variations in the conservation pattern will occur over time, as the number of sequences in the sequence databases used by ConSurf-DB increase. Each update of ConSurf-DB uses somewhat larger sequence databases, and consequently, the MSA's for each chain will be slightly different.
Slight variations in the conservation pattern will occur over time, as the number of sequences in the sequence databases used by ConSurf-DB increase. Each update of ConSurf-DB uses somewhat larger sequence databases, and consequently, the MSA's for each chain will be slightly different. Also, the methods employed by ConSurf are improved periodically. For example, the MSA algorithm originally defaulted to CLUSTAL-W, then to MUSCLE, and later to MAFFT.
 
For the same reasons, results from the ConSurf Server will also change slightly with time, even when the job parameters are the same. Only if you upload the same MSA will the results be identical for a given chain when the jobs are run months or years apart.
 
==Examining Functions of Proteins in ConSurf-DB's MSA==
 
As explained [[#ConSurf-DB Often Obscures Some Functional Sites|above]], ConSurf-DB typically includes proteins with more than one function in its conservation analysis. Before deciding whether to do a ConSurf Server job that [[#Limiting ConSurf Analysis to Proteins of a Single Function|limits the analysis to proteins of a single function]], you may want to see what proteins ConSurf-DB included in its analysis. Here is how to see the names (which hopefully reveal the functions) of the proteins included in ConSurf-DB's analysis of a protein chain. (The following steps are needed in May, 2009. A request to make this easier has been sent to the ConSurf-DB development team.)
 
# Go to [http://consurfdb.tau.ac.il consurfdb.tau.ac.il] (the DB, distinct from the ConSurf Server).
# Enter the [[PDB code]] (PDB ID) for the protein of interest.
# Click the button for ''complete results'' for the chain of interest.
# Under ''Alignment'', note the ''number of sequences used''.
# Under ''Output Files'' click on ''PSI-BLAST output''. Download the file seq.blast (OS X) or seq.blast.zip (Windows).
# '''Windows XP or Vista:'''
##Double click on seq.blast.zip to unzip it. Right click on seq.blast and Copy. Right click on your Desktop (or elsewhere of your choosing) and Paste. Now you have the unzipped file seq.blast.
## Open seq.blast in a program that can number lines. (Notepad and Wordpad cannot number lines.) Start MS Word or the free Open Office Writer program (available from [http://openoffice.org openoffice.org]). Use the ''File'' menu to ''Open'' seq.blast.
##Delete everything above the first sequence, so the first sequence will be line number 1. The first sequence follows the header ''Sequences producing significant alignments:''.
##Number the sequences by numbering the lines.
###MS Word: search for "add line numbers" to get instructions.
###Open Office Writer: Save the file as seq_blast.txt. (This enables line numbering.) Open the ''Tools'' menu, and select ''Line Numbering...''.
# '''Mac OS X:'''
##In the Finder, right-click (ctrl-click) on the file seq.blast, then ''Open With'' an application that can number lines of text. An excellent free one is ''Textwrangler'' from [http://barebones.com BareBones.Com].
##Delete everything above the first sequence, so the first sequence will be line number 1. The first sequence follows the header ''Sequences producing significant alignments:''.
##Number the sequences by numbering the lines.
###MS Word: Set the ''Open'' dialog to enable ''All Files''. Search for "add line numbers" to get instructions. You may need to select all and change the font (e.g. to Arial) to get the description of each sequence to fit on one line.
###TextWrangler (or BBEdit): Open the ''View'' menu, and under ''Text Display'' click ''Show Line Numbers''.
###iWork Pages appears to lack a line numbering capability.
# Now you have the sequences numbered. Find the number equal to the ''number of sequences used'' reported under ''Alignment'' by ConSurf-DB.
 
If the functions of the proteins for this sequence number (and lower numbers) differ from that of the protein of interest, then ConSurf-DB included proteins of multiple functions in its analysis. This tends to obscure patches of conservation that exist among proteins with the same function as the query protein of interest.
 
==Limiting ConSurf Analysis to Proteins of a Single Function==
 
As explained [[#ConSurf-DB Often Obscures Some Functional Sites|above]], the ConSurf-DB ''Evolutionary Conservation'' scene available in Proteopedia often includes proteins with multiple functions. However, the best way to find all functional sites by conservation analysis is to limit the analysis to proteins with a single function. A procedure for doing this follows. In June, 2009, the ConSurf development team is working on a new version that, once released, will enable selection of arbitrary sequences from the PSI-BLAST list.
 
#Go to [http://consurf.tau.ac.il consurf.tau.ac.il], the ConSurf Server (distinct from ConSurf-DB).
#Specify your PDB ID, Chain Identifier, and email address.
#Under ''Advanced Options'', set ''Max. Number of Homologues'' to '''all'''.
#Submit the job.
#When the job is completed, under ''Running Messages'', note the number of unique sequences used in the calculation.
#Under ''Final Results'', ''Sequences'', click on ''Unique Sequences Used''.
#Looking down the list of sequences from the top, find where the function of the protein first differs from that of the query protein of interest. Note  the number of the last sequence with the same function as the query protein. We'll call this the '''max with same function''' number.
#Re-run your ConSurf job making only one change. Set the ''Max. Number of Homologues'' to the "max with same function" that you determined in the previous step.
 
The results of the final step above may enable you to identify more functional sites than did the ConSurf-DB result built into Proteopedia.
 
See [[#How to Insert a ConSurf Result Into a Proteopedia Green Link|below]] for instructions on how to make a green-link scene in Proteopedia that shows your single-function ConSurf result.
 
If your results have more than a few amino acids with insufficient data (<font color="#c0c000"><b>yellow color</b></font>), you need more sequences. Repeat the procedure above with one change in the ConSurf job submission form: under ''Advanced Options'', use the much larger '''Uniprot''' database instead of the default Swiss-Prot database.
 
==The ConSurf-DB Mechanism==
Because results from the ConSurf DataBase server, [http://consurfdb.tau.ac.il ConSurf-DB]<ref name="consurfdb">PMID: 18971256</ref> are displayed within Proteopedia as ''Evolutionary Conservation'', an overview of its methods is provided here. ConSurf-DB pre-calculates conservation levels for each amino acid in every protein chain in the [[Protein Data Bank]]. It went into service in 2008. It uses state-of-the-art methods, all published in peer-reviewed journals<ref name="consurfdb" />. Each protein chain is processed as follows.
 
===ConSurf-DB Process===
#A list of unique protein chains is extracted from the [[Protein Data Bank]]. Chains shorter than 30 amino acids are not processed because they do not contain enough information for reliable phylogenetic tree construction. Non-standard residues are converted to the closest standard amino acids. Chains with more than 15% non-standard residues are not processed. Chains that could not be processed are colored gray in Proteopedia -- see the color key at the top of this page.
#The amino acid sequence of each protein chain is submitted to PSI-BLAST<ref>PSI-BLAST (Position Specific Iteration-BLAST) is an extension of the Basic Local Alignment Search Tool (BLAST) that is more sensitive at finding distantly related sequences. See [http://en.wikipedia.org/wiki/PSI-BLAST PSI-BLAST at Wikipedia] and [http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/psi1.html PSI-BLAST at NCBI].</ref> for collection of related sequences from UniprotKB/Swiss-Prot<ref>From [http://www.uniprot.org/help/uniprotkb UniProtKB help]: "UniProtKB/Swiss-Prot (reviewed) is a high quality manually annotated and non-redundant protein sequence database, which brings together experimental results, computed features and scientific conclusions."</ref>. Three iterations are performed using an expectation value<ref name="evalue">'''Expectation Value (E value):''' When searching a sequence database with a query sequence, e.g. using BLAST or PSI-BLAST, each found sequence can be characterized by an E value. It is the number of hits expected by chance with the sequence matching level observed, taking into account the size of the sequence database and length of the query sequence. Low values of E (much less than one) mean increasing significance of the match.</ref> cutoff of 10<sup>-3</sup>.
# The sequences gathered with PSI-BLAST are then filtered ([[#Filtering|see below]]) using a scheme that attempts a balance between limiting the sequences to close homologues, and including distant sequences that do not share structure or function.
#The filtered sequence set is multiply aligned with [http://www.drive5.com/muscle/ MUSCLE] (a multiple sequence alignment algorithm that out-performs CLUSTALW).
#A phylogenetic tree is constructed from the multiple sequence alignment (MSA) using the Rate4Site program developed by the ConSurf team.
#Rate4Site then calculates an evolutionary rate for each position in the MSA using a [http://en.wikipedia.org/wiki/Bayesian_inference Bayesian] approach shown by the ConSurf team to be superior<ref>PMID: 15201400</ref>. "The amino acid evolution is traced using the JTT<ref> PMID: 1633570</ref> substitution model. High evolutionary rate represents a variable position while low rate represents an evolutionarily conserved position."<ref name="consurfdb" />
#"The conservation scores are normalized so that the average over all residues is zero, and the standard deviation is one."<ref name="consurfdb" /> Thus, '''conservation scores are relative, not absolute''' and comparing them between different protein families might be misleading (see [[#Caveat|Caveat]] above).
#The normalized conservation scores are then divided into nine levels from 1 (highly variable) to 9 (highly conserved).
#Colors mapped to the nine conservation levels, from <font color="#0fC7CF"><b>turquoise (1)</b></font> to <font color="#A01F5F"><b>burgandy (9)</b></font> are applied to the 3D protein structure visualized in [[FirstGlance in Jmol]]. A coloring script for [[RasMol]] is also provided.
<center>{{Template:ColorKey_ConSurf}}</center>
#A confidence interval for the conservation level is calculated for each amino acid position in the MSA. When this indicates low reliability, the position is colored <font color="#c0c000"><b>yellow</b></font>, signifying that the data were insufficient to assign a meaningful conservation level.
<ol start="11">
<li>An ''Average Pairwise Distance'' (APD) is calculated to describe the diversity of sequences in the MSA ([[#Average Pairwise Distance|see below]]).
</li></ol>
 
The results of each stage of the above process may be viewed for each chain at [http://consurfdb.tau.ac.il ConSurf-DB]. In the initial run (February 2008), roughly 100 computer CPU's were utilized concurrently via a distributed computing system. Processing of the 30,918 unique protein chains in the [[PDB]] took about five days, or an average of roughly 30 minutes per chain.
 
===Filtering===
Filtering of the sequences gathered for each protein chain is crucial to making the ConSurfDB results maximally informative. Filtering consists of the following steps.
#Sequences with more than 95% sequence identity to the query sequence are discarded.
#Sequences shorter than 60% of the query sequence are discarded.
#Locally aligned sequence fragments that overlap by over 10% are discarded.
#Redundant sequences (>95% identical) are removed using CD-HIT<ref>PMID: 16731699</ref>.
#A maximum of 300 sequences meeting the above criteria is used (the 300 with the lowest expectation values<ref name="evalue" />, that is, most closely related to the query sequence).
#If the above process yields fewer than 50 sequences, the entire process is repeated using the Clean_UniProt database, which is about ten times larger than UniProtKB/Swiss-Prot. Clean_UniProt is a version of the UniProt database that attempts to exclude mutant or dubious sequences.
#If the above process yields fewer than 5 sequence homologs, no calculation is performed due to insufficient data. In February, 2008, this occurred for 1,348 chains out of 30,918 (4%).
 
===Average Pairwise Distance===
An ''Average Pairwise Distance'' (APD) is calculated to describe the diversity of sequences in the MSA generated during the processing of each chain. A value of 0.01 means that on average, there is one amino acid replacement for every 100 positions. Optimally informative results are obtained when the APD is between roughly 0.5 and 1.5.


==The ConSurf Server==
Consequently, results from the ConSurf Server will also change slightly with time, even when the job parameters are the same. Only if you upload the same MSA will the results be identical for a given chain when the jobs are run months or years apart.


The [http://consurf.tau.ac.il ConSurf Server], first available in 2001<ref>PMID: 11243830</ref><ref>PMID: 12499312</ref><ref>PMID: 15980475</ref> with many subsequent enhancements, can calculate and display the conservation pattern for 3D structures '''completely automatically'''. Generally, one should use the ConSurf Server only when the pre-calculated result at the [[#The ConSurf-DB Mechanism|ConSurf-DB]] needs improvement (for example, see [[#Limiting ConSurf Analysis to Proteins of a Single Function|above]]), or if you have your own multiple sequence alignment (MSA) that you wish to use. ConSurf-DB will nearly always give more informative results than the default settings of the ConSurf Server, due to the powerful sequence filtering that is built into ConSurf-DB. For an example, see the [http://consurfdb.tau.ac.il/comparison.php cytochrome c comparision at ConSurf-DB].
You may find it useful to download ConSurf results (from [[ConSurfDB_vs._ConSurf|either ConSurf server]]) in order to preserve a particular result for comparison with results obtained at later times.
 
Like ConSurf-DB, the ConSurf Server uses the same state-of-the-art methods, all of which are published in peer-reviewed journal articles. Unlike ConSurf-DB's pre-calculated results the ConSurf Server permits considerable customization. For example, the user may specify the number of sequences to use, choose the database from which sequences are obtained (Swiss-Prot or UniProt), set the Expectation cutoff<ref name="evalue" />, set the number of PSI-BLAST iterations, or submit their own multiple sequence alignment, or phylogenetic tree. Also you can upload your own PDB file, which enables you to process unpublished data, theoretical models, or "trimmed" chains, e.g. a domain of interest from a long chain.
 
In brief, the  [http://consurf.tau.ac.il ConSurf Server] uses the following process by default:
# Obtains the protein sequence for the specified PDB code (or uploaded PDB file) and chain.
# Gathers closely related sequences from Swiss-Prot (or Uniprot) with a PSI-BLAST search. E value cutoff<ref name="evalue" />, number of iterations, and number of sequences to use are configurable.
# Eliminates non-unique sequences, namely, those that are 99% or more identical with another sequence.
# Does a multiple sequence alignment with MUSCLE. (Or you can upload your own MSA.)
# Constructs a phylogenetic tree. (Or you can upload your own.)
# Calculates a conservation score for each amino acid. Classifies the conservation scores into nine levels, and maps them to standard conservation level colors (see color key at the top of this page). Marks residues for which the conservation score confidence interval is too large, hence the conservation score is unreliable ("insufficient data").
# Displays the protein, colored by conservation, in interactive 3D, using [[FirstGlance in Jmol]], [[Chimera]], [[PyMOL]], or [[Protein Explorer]].
 
Unlike ConSurf-DB, the ConSurf Server does '''no filtering''' of the gathered sequences before constructing the MSA (except to eliminate 99% redundant sequences). If the number of sequences obtained is too small, it is up to the user to run another job with parameters adjusted to obtain more sequences. Because sequences with &lt;99% redundancy are not filtered out, it usually takes more than the default 50 sequences to obtain an optimally informative result.
 
==Examples==
<applet load='2vaa' size='400' frame='true' align='right' caption='Evolutionary conservation reported by ConSurf-DB for Major Histocompatibility Class I alpha chain in [[2vaa]].' scene='Conservation,_Evolutionary/2vaa/1' />
At right is the pattern of evolutionary conservation and variability reported by [http://consurfdb.tau.ac.il ConSurf-DB] for the alpha chain of [[Major Histocompatibility Complex Class I]] (chain A of [[2vaa]]).
 
{| class="wikitable"
|-
|{{Template:ColorKey_ConSurf_NoYellow_NoGray}}
|Because the scene at the right contains no amino acids marked ''insufficient data'', and no chains with ''no data'', the yellow and gray colors need not be included in the color key.
For all the available variations of the ConSurf color key, see [[Help:Color_Keys#ConSurf]].
|}
 
[[2vaa]] contains three chains. Here, ConSurf colors are applied only to the alpha chain (chain A), while the beta chain (chain B) and the peptide (chain P) are shown as gray backbone traces. Below are instructions for how to insert a ConSurf result into a Proteopedia scene.
 
Examples of conserved patches on other proteins, revealed by ConSurf, will be found in the articles on
*[[Lac repressor]]
*[[Avian Influenza Neuraminidase, Tamiflu and Relenza]]
*[[Mechanosensitive channels: opening and closing]]
 
==How to Insert a ConSurf Result Into a Proteopedia Green Link==
 
To create a green-linked scene with a molecule colored by evolutionary conservation use the button "evolutionary conservation" in the "color" tab of the Scene Authoring Tools.
 
If for some reason you want to calculate the ConSurf coloring scheme on your own and want to insert that into a Proteopedia scene, here is how:
 
# Using either the [http://consurfdb.tau.ac.il ConSurf Database] or the [http://consurf.tau.ac.il ConSurf Server], obtain the desired result.
# At the ConSurf result page, use the link ''RasMol Coloring Script'' to display either the script showing or hiding insufficient data. '''Block and copy''' the entire script.
# We assume that you already have an article in Proteopedia, with a Jmol applet in place for displaying your ConSurf result. (If not, see the [[Proteopedia:Video_Guide|Video Guides]] and [[Help:Editing]].)
# Edit your Proteopedia page, and open the Scene Authoring Tools.
# Load the desired molecule into Jmol in the Scene Authoring Tools.
# Click on "Jmol" (at the lower right of Jmol) to open Jmol's menu, and there, click on '"Console'".
# In the small white Console window, '''paste''' your RasMol Coloring Script into the bottom box, and click Execute. On Mac OS X, you may be unable to paste into the Console. In that case, drag the blocked script and drop it into the bottom box of the Console, the click Execute.
# Make any other changes you wish to this scene, and then '''save the scene'''.
# Copy the wikitext for the green link that will display your scene, and close the Scene Authoring Tool.
# Paste the green link wikitext into your page, and save the page.
 
The color key that you see at the very top of this page can be inserted in any page using this wikitext:
 
<nowiki>{{Template:ColorKey_ConSurf}}</nowiki>
 
See also [[Help:Color_Keys]] for '''other variations on the color key'''. If something is not clear, please let us know at {{Template:Contact}}.


==Other Evolutionary Conservation Servers==
==Other Evolutionary Conservation Servers==
Line 236: Line 107:
===INTREPID===
===INTREPID===


In 2024, the INTREPID Server, formerly at the University of California, Berkeley, appears to be unavailable.
<!--
&quot;[http://phylogenomics.berkeley.edu/INTREPID/index.html INTREPID] is an information-theoretic approach for functional site identification that exploits the information in large diverse multiple sequence alignments. INTREPID gathers homologs for a sequence using PSI-BLAST and estimates a phylogenetic tree. It then uses Jensen-Shannon divergence to measure the information for each position in the sequence at each subtree node encountered on a traversal of the phylogeny, tracing a path from the root to the leaf corresponding to the sequence of interest. Positions that are conserved across the entire family receive stronger scores than those that only become conserved within more closely related subgroups. This tree traversal produces a phylogenomic conservation score for each position in the MSA. INTREPID uses information from sequence only, and can thus be used when knowledge of structure is not available.&quot; (Quoted from the [http://phylogenomics.berkeley.edu/INTREPID/index.html INTREPID website].)
&quot;[http://phylogenomics.berkeley.edu/INTREPID/index.html INTREPID] is an information-theoretic approach for functional site identification that exploits the information in large diverse multiple sequence alignments. INTREPID gathers homologs for a sequence using PSI-BLAST and estimates a phylogenetic tree. It then uses Jensen-Shannon divergence to measure the information for each position in the sequence at each subtree node encountered on a traversal of the phylogeny, tracing a path from the root to the leaf corresponding to the sequence of interest. Positions that are conserved across the entire family receive stronger scores than those that only become conserved within more closely related subgroups. This tree traversal produces a phylogenomic conservation score for each position in the MSA. INTREPID uses information from sequence only, and can thus be used when knowledge of structure is not available.&quot; (Quoted from the [http://phylogenomics.berkeley.edu/INTREPID/index.html INTREPID website].)


Line 242: Line 115:
Evidence is provided that INTREPID out-performs ConSurf for predicting catalytic residues.
Evidence is provided that INTREPID out-performs ConSurf for predicting catalytic residues.


Unlike ConSurf, INTREPID does not identify the [[#Locating Variable Patches|most variable residues]] in addition to the [[#Locating Conserved Patches|most conserved]].
Unlike ConSurf, INTREPID does not identify the [[#Locating Variable Patches|most variable residues]] in addition to the [[#Locating Conserved Patches|most conserved]]. -->
 
===xProtCAS===
 
[http://slim.icr.ac.uk/projects/xprotcas xProtCAS] is a tool to identify conserved surfaces on AlphaFold2 structural models. The tool defines autonomous structural modules from the structural models and converts these modules to a graph encoding residue topology, accessibility, and conservation. xProtCAS is available as open-source Python software and as an interactive web server.
 
&quot;The xProtCAS web server represents a fast, simple, and intuitive tool to analyze protein surface conservation. The two comparable available web-based tools for conserved accessible surface discovery, PatchFinder, and FuncPatch web servers, were no longer functional at the time of publication. There are overlaps with the functionality of the ConSurf server. However, the definition of the most conserved accessible surface and integration with AlphaFold2 models of the xProtCAS server adds key functionality not available with the ConSurf server.&quot; (Quoted from the [https://www.mdpi.com/2218-273X/13/6/906 xProtCAS associated publication]<ref>Kotb, H. M. and Davey, N. E. ''xProtCAS: A Toolkit for Extracting Conserved Accessible Surfaces from Protein Structures'' Biomolecules 33:906 (2023). [https://doi.org/10.3390/biom13060906 DOI: 10.3390/biom13060906]</ref>.)


===siteFiNDER|3D===
===siteFiNDER|3D===
In 2024, the siteFiNDER 3D Server, formerly at Yale University, appears to be unavailable. <!--


[http://sitefinder3d.mbb.yale.edu/ siteFiNDER|3D] performs ''conserved functional group'' (CFG) analysis. "CFG Analysis is a general method for predicting the location of functionally important sites within a target protein structure. Like other available structure/sequence analysis techniques, CFG Analysis exploits the evolutionary relationships present across groups of homologous proteins to identify regions that are likely to be of functional significance. However, this technique is particularly useful for situations where other methods fail, for instance when only a few or highly similar homologues can be identified." As its name implies, CFG analysis attempts to identify groups of conserved amino acids that together represent a functional site. In this respect, it goes beyond most other evolutionary conservation servers, which stop at assigning a conservation value to each amino acid. See the [http://consurfdb.tau.ac.il/comparison.php comparison of siteFiNDER|3D with ConSurf for cytochrome c].
[http://sitefinder3d.mbb.yale.edu/ siteFiNDER|3D] performs ''conserved functional group'' (CFG) analysis. "CFG Analysis is a general method for predicting the location of functionally important sites within a target protein structure. Like other available structure/sequence analysis techniques, CFG Analysis exploits the evolutionary relationships present across groups of homologous proteins to identify regions that are likely to be of functional significance. However, this technique is particularly useful for situations where other methods fail, for instance when only a few or highly similar homologues can be identified." As its name implies, CFG analysis attempts to identify groups of conserved amino acids that together represent a functional site. In this respect, it goes beyond most other evolutionary conservation servers, which stop at assigning a conservation value to each amino acid. See the [http://consurfdb.tau.ac.il/comparison.php comparison of siteFiNDER|3D with ConSurf for cytochrome c].


This site provides links to several other software packages that predict functional sites, some of which are not further discussed in the present article.
This site provides links to several other software packages that predict functional sites, some of which are not further discussed in the present article. -->


===HotPatch===
===HotPatch===


[http://hotpatch.mbi.ucla.edu/ HotPatch] <ref>PMID: 17451744</ref> "finds unusual patches on the surface of proteins, and computes just how unusual they are (patch rareness), and how likely each patch is to be of functional importance (functional confidence (FC).) The statistical analysis is done by comparing your protein's surface against the surfaces of a large set of proteins whose functional sites are known." One advantage of HotPatch is that sequence homologs are not required. See the [http://consurfdb.tau.ac.il/comparison.php comparison of HotPatch with ConSurf for cytochrome c].
In 2024, the HotPatch Server, formerly at UCLA, appears to be unavailable. <!--
[http://hotpatch.mbi.ucla.edu/ HotPatch] <ref>PMID: 17451744</ref> "finds unusual patches on the surface of proteins, and computes just how unusual they are (patch rareness), and how likely each patch is to be of functional importance (functional confidence (FC).) The statistical analysis is done by comparing your protein's surface against the surfaces of a large set of proteins whose functional sites are known." One advantage of HotPatch is that sequence homologs are not required. See the [http://consurfdb.tau.ac.il/comparison.php comparison of HotPatch with ConSurf for cytochrome c]. -->


===Evolutionary Trace Viewer===
===Evolutionary Trace Viewer===


[http://mammoth.bcm.tmc.edu/traceview/index.html Evolutionary Trace Viewer] (ETV). See the [http://consurfdb.tau.ac.il/comparison.php comparison of ETV with ConSurf for cytochrome c].
[http://evolution.lichtargelab.org/ETviewer Evolutionary Trace Viewer] (ETV).<!--
 
See the [http://consurfdb.tau.ac.il/comparison.php comparison of ETV with ConSurf for cytochrome c].-->
<blockquote>
<blockquote>
Comment by [[User:Eric Martz]], March, 2009: From the information provided on the ETV website, I found it quite difficult to understand what the ETV is doing, or how to use the viewer. An explanation in simple terms for non-specialists would be very useful.
Comment by [[User:Eric Martz]], March, 2009: From the information provided on the ETV website, I found it quite difficult to understand what the ETV is doing, or how to use the viewer. An explanation in simple terms for non-specialists would be very useful.
</blockquote>
</blockquote>


==Notes==
===EVcouplings / EVfold===
[https://evcouplings.org/ EVolutionary Couplings server] provides functional and structural information about proteins derived from the evolutionary sequence record using methods from statistical physics.
 
This site provides links to several other related servers and software packages.
 
==See Also==
*[[Introduction to Evolutionary Conservation]] gives examples with multiple sequence alignments.
*[[How to see conserved regions]] give instructions for Proteopedia, FirstGlance, and green links.
*[[ConSurfDB vs. ConSurf]]: How the servers work and how to get optimal results from ConSurf.
*[[Help:How to Insert a ConSurf Result Into a Proteopedia Green Link]]
*[[ConSurf/Index]] lists all ConSurf-related pages in Proteopedia.
 
==References==
<references />
<references />

Latest revision as of 20:49, 1 August 2024

For a more basic explanation of this subject, please see Introduction to Evolutionary Conservation.

Mutations occur spontaneously in each generation, randomly changing the amino acid sequences of proteins. Individuals with mutations that impair critical functions of proteins may have resulting problems that make them less able to reproduce. Harmful mutations are lost from the gene pool because the individuals carrying them reproduce less effectively. Over time, only harmless (or very rare beneficial) mutations are maintained in the gene pool. This is evolution. Rett Syndrome is a stark illustration of these principles.

When the sequences of a given protein are compared between taxa, using multiple sequence alignment (MSA), differences between sequences most often represent mutations that were allowed (by evolution) to persist because they were harmless. Where the sequences are identical, we say that sequence was conserved. Such evolutionary conservation occurs because mutations of these amino acids were harmful to protein function, and were lost over time. Amino acids that are conserved are those most critical to the function of the protein. Thus, looking for evolutionarily conserved patches of amino acids in a 3D protein structure is a good way to locate functional sites. See the case study of enolase, an enzyme in the glycolytic pathway.

Proteopedia's evolutionary conservation colors are pre-calculated by ConSurf-DB.

The nine conservation grade colors utilized by ConSurf-DB and the ConSurf Server, plus yellow for amino acids with insufficient data, and gray for chains that ConSurf did not process. See Help:Color Keys.

  • Insufficient Data describes amino acids for which a meaningful conservation level could not be derived from the set of homologous sequences utilized. This occurs when the confidence interval for the calculated conservation level is too large. For more, see the ConSurfDB Process.

    For an example, show Evolutionary Conservation at 1hgf.


  • No Data describes entire protein chains that were not or could not be processed by ConSurf-DB. For details, see ConSurfDB Process. For an example, show Evolutionary Conservation at 1hgf.



Locating Conserved PatchesLocating Conserved Patches

Patches of highly conserved amino acid residues on the surface of a protein molecular structure are good candidates for functional sites. Many articles in Proteopedia that are titled with a PDB code have an Evolutionary Conservation section below the molecular scene. (Results could not be obtained for a small percentage -- see ConSurfDB Process.) Clicking show in the blue Evolutionary Conservation bar automatically colors all chains in the molecule by evolutionary conservation as calculated by ConSurf-DB. A typical example is conservation of the catalytic pocket of the enzyme enolase. For more examples, click on Random PDB entry in the random box at the upper left of every page in Proteopedia.

Briefly, ConSurf-DB gathers sequences similar to that of the protein in question, then constructs a multiple sequence alignment, and analyses it for sequence positions that are conserved (have lower than average differences between sequences) and that are variable (have higher than average differences between sequences). Each amino acid is assigned a conservation score and corresponding color in Proteopedia's interactive 3D molecular scene.

ConSurf-DB's analysis is done with sophisticated, published, peer-reviewed, state of the art methods. A more detailed overview of the process employed by ConSurf-DB is available. Proteopedia's built-in display of ConSurf-DB results is a good place to start looking for conserved patches.

However, ConSurf-DB usually does not show all the conserved patches present in proteins with the same function. Therefore, you may wish to extend your analysis of conservation by using the ConSurf Server to limit the analysis to proteins of one function. The results of such an analysis can be displayed in a molecular scene in Proteopedia. See Help:How to Insert a ConSurf Result Into a Proteopedia Green Link.

Locating Variable PatchesLocating Variable Patches

In some cases, patches of highly variable (rapidly mutating) residues are also functional sites. These can also be identified preliminarily with Proteopedia's Evolutionary Conservation scenes from ConSurfDB, and more definitively with conservation analysis limited to proteins of a single function. For example, mutations in influenza hemagglutinin help the virus to evade host defenses (see 1hgf). Another example is the high allelic variability of the peptide-binding groove of Major Histocompatibility Complex Class I. That variability helps the grooves of the alleles within any individual to bind a wide range of peptides, hence enabling the T lymphocyte system to defend against a wide range of pathogens, including influenza virus.

Conservation for Domain FoldingConservation for Domain Folding

Certain residues on the surfaces of protein molecules tend to be conserved in order to maintain proper folding, rather than because they are part of a site functioning to interact with substrate, ligand, or a protein partner. Secondary structure elements need to break at the protein molecular surface in order to turn back into the folded protein domain. Therefore, it is common to see isolated highly conserved residues that enable turns, or break helices, notably glycines or prolines, on protein structure surfaces.

Cysteines that form disulfide bridges are typically conserved, as are other amino acids that form rare protein crosslinks.

Charged residues are usually on the surfaces of folded proteins. If you see a highly conserved charged residue (Arg, Asp, Glu, Lys') on the surface, often it participates in a salt bridge. Salt bridges help to stabilize protein folds, and hence the residues involved are often highly conserved. Example: Asp6 with Arg8 in 1qdq.

For other situations where conservation is expected, see Expected vs. Unexpected Conservation.

Remember that you can touch any residue with the mouse in the Evolutionary Conservation scene in Proteopedia (in Jmol), and its identity will be displayed after a few seconds. This works best with spinning turned off.

Every structure in Proteopedia has a link to be displayed in FirstGlance in Jmol. There, you can use the Find dialog to enter the name of an amino acid, e.g. glycine or proline, and the positions of all of the specified amino acids will be highlighted. You can then visualize their distribution in the 3D structure. This strategy can also be utilized when viewing the protein colored by conservation, using the FirstGlance links in either ConSurf server.

CaveatsCaveats

ConSurf-DB Often Obscures Some Functional SitesConSurf-DB Often Obscures Some Functional Sites

Proteopedia's Evolutionary Conservation scenes use pre-calculated results from ConSurf-DB. ConSurf-DB is designed to include a wide range of sequences in its multiple-sequence alignments (MSA) and analyses. Often, the MSA will a include substantial number of sequences for proteins with different functions than the query protein. (See these instructions for how to find out the functions of the proteins used in ConSurf-DB's MSA.) Consequently, amino acids that are colored as highly conserved by ConSurf-DB are truly highly conserved across a wide range of sequence-similar proteins. However, amino acids that are highly conserved in proteins with the same function as the query protein may not appear conserved in ConSurf-DB results. A good way to find these obscured functional sites is to do a conservation analysis that is limited to proteins of a single function. See Limiting ConSurf Analysis to Proteins of a Single Function.

Use Caution When Comparing Conservation of Sequence-Different ChainsUse Caution When Comparing Conservation of Sequence-Different Chains

This caveat applies only to molecules that contain chains with different sequences. The conservation colors shown in Proteopedia's Evolutionary Conservation scenes do not indicate the same levels of conservation for chains of different sequences. This is because ConSurf-DB calculates conservation levels independently for each sequence-different chain, and the levels are relative to the multiple sequence alignment constructed for each sequence-independent chain.

For example, consider 1bqh (a Major Histocompatibility Class I protein), which contains 5 chains with four distinct sequences. A visit to ConSurf-DB reveals, as expected, that a different number of sequences was utilized for the multiple sequence alignment (MSA) and conservation calculations for each of these sequence-different chains, and that each MSA had a different average pairwise difference (APD), a measure of diversity within the MSA. Therefore, residues with, for example, conservation level 9 (maximal conservation) in each of the three ConSurf-DB-colored sequence-different chains have the highest levels of conservation within their own chain, but do not have exactly the same absolute levels of conservation.

1bqh
Chain Length Number of sequences in MSA APD
A 274 144 1.72
B 99 75 1.49
C 8 Length below minimum for ConSurf
G 129 201 1.35

In Proteopedia's Evolutionary Conservation scenes, all the chains in the molecule are colored in the same scene. This gives a potentially useful overview, but can be misleading unless one realizes that a given conservation color, in two sequence-different chains, does not mean exactly the same level of conservation. In contrast to Proteopedia's Evolutionary Conservation scenes, ConSurf-DB and ConSurf Server apply conservation level colors to only one chain sequence at a time, thereby avoiding this possible confusion.

Conservation Results Will Change With TimeConservation Results Will Change With Time

Slight variations in the conservation pattern will occur over time, as the number of sequences in the sequence databases used by ConSurf-DB increase. Each update of ConSurf-DB uses somewhat larger sequence databases, and consequently, the MSA's for each chain will be slightly different. Also, the methods employed by ConSurf are improved periodically. For example, the MSA algorithm originally defaulted to CLUSTAL-W, then to MUSCLE, and later to MAFFT.

Consequently, results from the ConSurf Server will also change slightly with time, even when the job parameters are the same. Only if you upload the same MSA will the results be identical for a given chain when the jobs are run months or years apart.

You may find it useful to download ConSurf results (from either ConSurf server) in order to preserve a particular result for comparison with results obtained at later times.

Other Evolutionary Conservation ServersOther Evolutionary Conservation Servers

INTREPIDINTREPID

In 2024, the INTREPID Server, formerly at the University of California, Berkeley, appears to be unavailable.

xProtCASxProtCAS

xProtCAS is a tool to identify conserved surfaces on AlphaFold2 structural models. The tool defines autonomous structural modules from the structural models and converts these modules to a graph encoding residue topology, accessibility, and conservation. xProtCAS is available as open-source Python software and as an interactive web server.

"The xProtCAS web server represents a fast, simple, and intuitive tool to analyze protein surface conservation. The two comparable available web-based tools for conserved accessible surface discovery, PatchFinder, and FuncPatch web servers, were no longer functional at the time of publication. There are overlaps with the functionality of the ConSurf server. However, the definition of the most conserved accessible surface and integration with AlphaFold2 models of the xProtCAS server adds key functionality not available with the ConSurf server." (Quoted from the xProtCAS associated publication[1].)

siteFiNDER|3DsiteFiNDER|3D

In 2024, the siteFiNDER 3D Server, formerly at Yale University, appears to be unavailable.

HotPatchHotPatch

In 2024, the HotPatch Server, formerly at UCLA, appears to be unavailable.

Evolutionary Trace ViewerEvolutionary Trace Viewer

Evolutionary Trace Viewer (ETV).

Comment by User:Eric Martz, March, 2009: From the information provided on the ETV website, I found it quite difficult to understand what the ETV is doing, or how to use the viewer. An explanation in simple terms for non-specialists would be very useful.

EVcouplings / EVfoldEVcouplings / EVfold

EVolutionary Couplings server provides functional and structural information about proteins derived from the evolutionary sequence record using methods from statistical physics.

This site provides links to several other related servers and software packages.

See AlsoSee Also

ReferencesReferences

  1. Kotb, H. M. and Davey, N. E. xProtCAS: A Toolkit for Extracting Conserved Accessible Surfaces from Protein Structures Biomolecules 33:906 (2023). DOI: 10.3390/biom13060906

Proteopedia Page Contributors and Editors (what is this?)Proteopedia Page Contributors and Editors (what is this?)

Eric Martz, Eran Hodis, Wayne Decatur