Conservation, Evolutionary: Difference between revisions

Latest revision as of 20:49, 1 August 2024

For a more basic explanation of this subject, please see Introduction to Evolutionary Conservation.

Mutations occur spontaneously in each generation, randomly changing the amino acid sequences of proteins. Individuals with mutations that impair critical functions of proteins may have resulting problems that make them less able to reproduce. Harmful mutations are lost from the gene pool because the individuals carrying them reproduce less effectively. Over time, only harmless (or very rare beneficial) mutations are maintained in the gene pool. This is evolution. Rett Syndrome is a stark illustration of these principles.

When the sequences of a given protein are compared between taxa, using multiple sequence alignment (MSA), differences between sequences most often represent mutations that were allowed (by evolution) to persist because they were harmless. Where the sequences are identical, we say that sequence was conserved. Such evolutionary conservation occurs because mutations of these amino acids were harmful to protein function, and were lost over time. Amino acids that are conserved are those most critical to the function of the protein. Thus, looking for evolutionarily conserved patches of amino acids in a 3D protein structure is a good way to locate functional sites. See the case study of enolase, an enzyme in the glycolytic pathway.

Proteopedia's evolutionary conservation colors are pre-calculated by ConSurf-DB.

	The nine conservation grade colors utilized by ConSurf-DB and the ConSurf Server, plus yellow for amino acids with insufficient data, and gray for chains that ConSurf did not process. See Help:Color Keys.
Insufficient Data describes amino acids for which a meaningful conservation level could not be derived from the set of homologous sequences utilized. This occurs when the confidence interval for the calculated conservation level is too large. For more, see the ConSurfDB Process. For an example, show Evolutionary Conservation at 1hgf. No Data describes entire protein chains that were not or could not be processed by ConSurf-DB. For details, see ConSurfDB Process. For an example, show Evolutionary Conservation at 1hgf.

Locating Conserved PatchesLocating Conserved Patches

Patches of highly conserved amino acid residues on the surface of a protein molecular structure are good candidates for functional sites. Many articles in Proteopedia that are titled with a PDB code have an Evolutionary Conservation section below the molecular scene. (Results could not be obtained for a small percentage -- see ConSurfDB Process.) Clicking show in the blue Evolutionary Conservation bar automatically colors all chains in the molecule by evolutionary conservation as calculated by ConSurf-DB. A typical example is conservation of the catalytic pocket of the enzyme enolase. For more examples, click on Random PDB entry in the random box at the upper left of every page in Proteopedia.

Briefly, ConSurf-DB gathers sequences similar to that of the protein in question, then constructs a multiple sequence alignment, and analyses it for sequence positions that are conserved (have lower than average differences between sequences) and that are variable (have higher than average differences between sequences). Each amino acid is assigned a conservation score and corresponding color in Proteopedia's interactive 3D molecular scene.

ConSurf-DB's analysis is done with sophisticated, published, peer-reviewed, state of the art methods. A more detailed overview of the process employed by ConSurf-DB is available. Proteopedia's built-in display of ConSurf-DB results is a good place to start looking for conserved patches.

However, ConSurf-DB usually does not show all the conserved patches present in proteins with the same function. Therefore, you may wish to extend your analysis of conservation by using the ConSurf Server to limit the analysis to proteins of one function. The results of such an analysis can be displayed in a molecular scene in Proteopedia. See Help:How to Insert a ConSurf Result Into a Proteopedia Green Link.

Locating Variable PatchesLocating Variable Patches

In some cases, patches of highly variable (rapidly mutating) residues are also functional sites. These can also be identified preliminarily with Proteopedia's Evolutionary Conservation scenes from ConSurfDB, and more definitively with conservation analysis limited to proteins of a single function. For example, mutations in influenza hemagglutinin help the virus to evade host defenses (see 1hgf). Another example is the high allelic variability of the peptide-binding groove of Major Histocompatibility Complex Class I. That variability helps the grooves of the alleles within any individual to bind a wide range of peptides, hence enabling the T lymphocyte system to defend against a wide range of pathogens, including influenza virus.

Conservation for Domain FoldingConservation for Domain Folding

Certain residues on the surfaces of protein molecules tend to be conserved in order to maintain proper folding, rather than because they are part of a site functioning to interact with substrate, ligand, or a protein partner. Secondary structure elements need to break at the protein molecular surface in order to turn back into the folded protein domain. Therefore, it is common to see isolated highly conserved residues that enable turns, or break helices, notably glycines or prolines, on protein structure surfaces.

Cysteines that form disulfide bridges are typically conserved, as are other amino acids that form rare protein crosslinks.

Charged residues are usually on the surfaces of folded proteins. If you see a highly conserved charged residue (Arg, Asp, Glu, Lys') on the surface, often it participates in a salt bridge. Salt bridges help to stabilize protein folds, and hence the residues involved are often highly conserved. Example: Asp6 with Arg8 in 1qdq.

For other situations where conservation is expected, see Expected vs. Unexpected Conservation.

Remember that you can touch any residue with the mouse in the Evolutionary Conservation scene in Proteopedia (in Jmol), and its identity will be displayed after a few seconds. This works best with spinning turned off.

Every structure in Proteopedia has a link to be displayed in FirstGlance in Jmol. There, you can use the Find dialog to enter the name of an amino acid, e.g. glycine or proline, and the positions of all of the specified amino acids will be highlighted. You can then visualize their distribution in the 3D structure. This strategy can also be utilized when viewing the protein colored by conservation, using the FirstGlance links in either ConSurf server.

CaveatsCaveats

ConSurf-DB Often Obscures Some Functional SitesConSurf-DB Often Obscures Some Functional Sites

Proteopedia's Evolutionary Conservation scenes use pre-calculated results from ConSurf-DB. ConSurf-DB is designed to include a wide range of sequences in its multiple-sequence alignments (MSA) and analyses. Often, the MSA will a include substantial number of sequences for proteins with different functions than the query protein. (See these instructions for how to find out the functions of the proteins used in ConSurf-DB's MSA.) Consequently, amino acids that are colored as highly conserved by ConSurf-DB are truly highly conserved across a wide range of sequence-similar proteins. However, amino acids that are highly conserved in proteins with the same function as the query protein may not appear conserved in ConSurf-DB results. A good way to find these obscured functional sites is to do a conservation analysis that is limited to proteins of a single function. See Limiting ConSurf Analysis to Proteins of a Single Function.

Use Caution When Comparing Conservation of Sequence-Different ChainsUse Caution When Comparing Conservation of Sequence-Different Chains

This caveat applies only to molecules that contain chains with different sequences. The conservation colors shown in Proteopedia's Evolutionary Conservation scenes do not indicate the same levels of conservation for chains of different sequences. This is because ConSurf-DB calculates conservation levels independently for each sequence-different chain, and the levels are relative to the multiple sequence alignment constructed for each sequence-independent chain.

For example, consider 1bqh (a Major Histocompatibility Class I protein), which contains 5 chains with four distinct sequences. A visit to ConSurf-DB reveals, as expected, that a different number of sequences was utilized for the multiple sequence alignment (MSA) and conservation calculations for each of these sequence-different chains, and that each MSA had a different average pairwise difference (APD), a measure of diversity within the MSA. Therefore, residues with, for example, conservation level 9 (maximal conservation) in each of the three ConSurf-DB-colored sequence-different chains have the highest levels of conservation within their own chain, but do not have exactly the same absolute levels of conservation.

Chain	Length	Number of sequences in MSA	APD
1bqh
A	274	144	1.72
B	99	75	1.49
C	8	Length below minimum for ConSurf
G	129	201	1.35

In Proteopedia's Evolutionary Conservation scenes, all the chains in the molecule are colored in the same scene. This gives a potentially useful overview, but can be misleading unless one realizes that a given conservation color, in two sequence-different chains, does not mean exactly the same level of conservation. In contrast to Proteopedia's Evolutionary Conservation scenes, ConSurf-DB and ConSurf Server apply conservation level colors to only one chain sequence at a time, thereby avoiding this possible confusion.

Conservation Results Will Change With TimeConservation Results Will Change With Time

Slight variations in the conservation pattern will occur over time, as the number of sequences in the sequence databases used by ConSurf-DB increase. Each update of ConSurf-DB uses somewhat larger sequence databases, and consequently, the MSA's for each chain will be slightly different. Also, the methods employed by ConSurf are improved periodically. For example, the MSA algorithm originally defaulted to CLUSTAL-W, then to MUSCLE, and later to MAFFT.

Consequently, results from the ConSurf Server will also change slightly with time, even when the job parameters are the same. Only if you upload the same MSA will the results be identical for a given chain when the jobs are run months or years apart.

You may find it useful to download ConSurf results (from either ConSurf server) in order to preserve a particular result for comparison with results obtained at later times.

Other Evolutionary Conservation ServersOther Evolutionary Conservation Servers

INTREPIDINTREPID

In 2024, the INTREPID Server, formerly at the University of California, Berkeley, appears to be unavailable.

xProtCASxProtCAS

xProtCAS is a tool to identify conserved surfaces on AlphaFold2 structural models. The tool defines autonomous structural modules from the structural models and converts these modules to a graph encoding residue topology, accessibility, and conservation. xProtCAS is available as open-source Python software and as an interactive web server.

"The xProtCAS web server represents a fast, simple, and intuitive tool to analyze protein surface conservation. The two comparable available web-based tools for conserved accessible surface discovery, PatchFinder, and FuncPatch web servers, were no longer functional at the time of publication. There are overlaps with the functionality of the ConSurf server. However, the definition of the most conserved accessible surface and integration with AlphaFold2 models of the xProtCAS server adds key functionality not available with the ConSurf server." (Quoted from the xProtCAS associated publication^[1].)

siteFiNDER|3DsiteFiNDER|3D

In 2024, the siteFiNDER 3D Server, formerly at Yale University, appears to be unavailable.

HotPatchHotPatch

In 2024, the HotPatch Server, formerly at UCLA, appears to be unavailable.

Evolutionary Trace ViewerEvolutionary Trace Viewer

Evolutionary Trace Viewer (ETV).

Comment by User:Eric Martz, March, 2009: From the information provided on the ETV website, I found it quite difficult to understand what the ETV is doing, or how to use the viewer. An explanation in simple terms for non-specialists would be very useful.

EVcouplings / EVfoldEVcouplings / EVfold

EVolutionary Couplings server provides functional and structural information about proteins derived from the evolutionary sequence record using methods from statistical physics.

This site provides links to several other related servers and software packages.

ReferencesReferences

↑ Kotb, H. M. and Davey, N. E. xProtCAS: A Toolkit for Extracting Conserved Accessible Surfaces from Protein Structures Biomolecules 33:906 (2023). DOI: 10.3390/biom13060906

Proteopedia Page Contributors and Editors (what is this?)Proteopedia Page Contributors and Editors (what is this?)

Eric Martz, Eran Hodis, Wayne Decatur

[1] Kotb, H. M. and Davey, N. E. xProtCAS: A Toolkit for Extracting Conserved Accessible Surfaces from Protein Structures Biomolecules 33:906 (2023). DOI: 10.3390/biom13060906

[1]