Practical Guide to Homology Modeling: Difference between revisions

Eric Martz (talk | contribs)
Eric Martz (talk | contribs)
No edit summary
 
(87 intermediate revisions by 3 users not shown)
Line 1: Line 1:
<table style="background:#ffff80;"><tr><td>Many assertions in this article are lacking literature citations. Help improving documentation in this article will be appreciated. Wikipedia's article on [http://en.wikipedia.org/wiki/Homology_modeling Homology modeling] is well documented, although more technical and less of a practical guide than the present article.</td></tr></table>
<table style="background:#ffff80;"><tr><td>
<big>Homology modeling has become largely obsolete since the 2020 success of structure prediction by [[AlphaFold]] and other AI prediction systems. Rather than starting here, we suggest starting at [[How To Find A Structure]].
</big>
<br><br>
Many assertions in this article are lacking literature citations. Help improving documentation in this article will be appreciated. Wikipedia's article on [http://en.wikipedia.org/wiki/Homology_modeling Homology modeling] is well documented, although more technical and less of a practical guide than the present article.</td></tr></table>


== Terminology ==
== Terminology ==
Line 5: Line 9:
*'''Query sequence''': The amino acid sequence for which a 3D model is wanted. More commonly called the ''target'' sequence, but talking about target vs. template gets confusing.
*'''Query sequence''': The amino acid sequence for which a 3D model is wanted. More commonly called the ''target'' sequence, but talking about target vs. template gets confusing.
*'''Template''': An empirically determined 3D protein structure with significant sequence similarity to the query.
*'''Template''': An empirically determined 3D protein structure with significant sequence similarity to the query.
*'''&quot;Structure&quot;''' will be used in this article to mean three-dimensional structure.
*'''&quot;Structure&quot;''' will be used in this article to mean three-dimensional protein molecular structure.


== What Is A Homology Model? ==
== What Is A Homology Model? ==
Line 20: Line 24:
== Do you need a homology model? ==
== Do you need a homology model? ==


You don’t need a homology model if the amino acid sequence of interest (the query sequence) already has an empirically determined 3D structure. Structures determined empirically, by X-ray crystallography or (much less often) by solution NMR, will almost always be more accurate than a homology model.
You don’t need a homology model if the amino acid sequence of interest (the query sequence) already has an empirically determined 3D structure. Structures determined empirically, by X-ray crystallography or (much less often) by solution NMR or cryo-EM, will almost always be more accurate than a homology model.
 
If [[AlphaFold]] has predicted a model for your amino acid sequence of interest, it will often be more accurate than a homology model, and in most cases, a homology model won't be possible due to lack of a suitable template.
 
=== Has AlphaFold predicted a model? ===
Empirical models are the most reliable, but if none are available, [[AlphaFold]] has an impressive track record of correctly predicting structures from sequence. Check the [http://alphafold.ebi.ac.uk AlphaFold Database] for a model of your protein of interest. You can also submit a sequence and get a prediction: [[How to predict structures with AlphaFold]]. Another model prediction service with a good track record is [http://robetta.bakerlab.org RoseTTaFold]. Submit your sequence there, making sure to check ''RoseTTaFold'' as the method. With any of these methods, download the predicted [[PDB file]] and then upload it to [http://firstglance.jmol.org FirstGlance in Jmol] for exploration and analysis. FirstGlance automatically colors predicted models by reliability.


=== Is there an empirical model? ===
=== Is there an empirical model? ===


All published, empirically-determined, atomic-resolution, macromolecular 3D structures are available in the [[[Protein Data Bank]] (PDB, pdb.org).
[[Empirical models|Empirically-determined]] models are usually the most reliable. All published, empirically-determined, atomic-resolution, macromolecular 3D structures are available in the [[World Wide Protein Data Bank]].


Each model in the PDB has a unique 4-character identification code ([[PDB ID]]) that begins with a numeral, and has letters or numerals for the last 3 characters . Examples are 1d66, 4mdh, 9ins.
Each model in the PDB has a unique 4-character identification code ([[PDB ID]]) that begins with a numeral, and has letters or numerals for the last 3 characters . Examples are 1d66, 4mdh, 9ins.


Here are two methods for finding out if your query amino acid sequence, or parts of it, have empirically-determined 3D structures in the PDB.
Here are two methods for finding out if your query amino acid sequence, or parts of it, have [[Empirical models|empirically-determined 3D structures]] in the PDB.


==== Simple search for empirical models (via PIR) ====
==== Simple search for empirical models (via PIR) ====


At UniProt.Org, find your protein and click on Structure.
At [http://uniprot.org UniProt.Org], find your protein and click on '''Structure''' (blue button at the left).


*If there is a column labeled “Entry” with 4-character PDB IDs, these are empirical structures for your protein. Pay attention to the “Positions” column, which gives the sequence number range covered by each model.
*If there is a section ''3D Structure Databases'' with a column labeled '''PDB entry''' containing 4-character [[PDB code|PDB IDs]], these are empirical structures for your protein. Pay attention to the “Positions” column, which gives the sequence number range covered by each model.
*If there is no “Entry” column, then there are no sequence-identical empirical structures for your protein. Then try the Advanced search method below.
** To explore one of these models, write down its 4-character [[PDB code]]. Then see [[#How To Explore 3D Models]] below.
*Some proteins have no Structure section (e.g. K4QDG1_SACBA). Then try the Advanced search method below.
*If there is no “PDB entry” column, then there are no sequence-identical empirical structures for your protein. Then try the Advanced search method below.
*Some proteins have no Structure section (e.g. [http://www.uniprot.org/uniprot/K4QDG1 K4QDG1_SACBA]). Then try the Advanced search method below.


If empirical structures exist, see sections below for guidance on how to explore them. If they are satisfactory, then you don't need a homology model.
If empirical structures exist, see [[#How To Explore 3D Models]] below. If they are satisfactory, then you don't need a homology model.


==== Advanced search for empirical models (RCSB PDB) ====
==== Advanced search for empirical models (RCSB PDB) ====
Line 50: Line 60:
#Copy the FASTA format sequence for your protein, for example, from [http://uniprot.org UniProt.Org].
#Copy the FASTA format sequence for your protein, for example, from [http://uniprot.org UniProt.Org].
#Note the '''length''' of your sequence.
#Note the '''length''' of your sequence.
#At [http://pdb.org pdb.org], go to Advanced Search.
#At [http://rcsb.org rcsb.org], go to Advanced Search.
#Click on “Choose a query type” and select Sequence under “Sequence Features”.
#Select '''Sequence''' under 'Advanced Search Query Builder'.
#Paste your query sequence into the large box, and click the “Submit Query” button at the lower right of the search interface box.
#Paste your query sequence into the box.
#The best hits will be listed first, starting below “Showing 1-25 of XXX Results”.  Notice that each hit starts with a large, bold PDB ID. In the “Alignment” section of the first hit, click on “Display for All Results”. Also in the “Compound” section, click “Display for All Results”.
#Push the [[Image:Rcsb-search-button.png]] button to run the search.
#Scroll down to see the list of hits.
#At the top of the list, change <font color='red'>Display Results as</font> to <font color='red'>'''Polymer Entities'''</font>. Then push [[Image:Rcsb-search-button.png]] again. <font color='red'>''This is crucial''</font> because it displays the identity percentages and alignments for the hits. It should be the default!
#The best hits will be listed first.  Notice that each hit starts with a large, bold PDB ID.


For each hit, notice the “Identities” above the sequence alignment box. The denominator tells you the length of the sequence alignment. The percentage tells you the sequence identity of the alignment.
For each hit, notice the '''Sequence Identity %''' above the sequence alignment box.


For example, “Identities: 355/1045 (34%)” means that 1,045 residues of your query sequence align to the hit with 34% sequence identity (355 identical residues in the alignment). Knowing that my query had length 1,170 residues, I can see that this potential template for a homology model would enable me to model 1,045/1,170 = 89% of my query sequence. Quite often the alignment would span a much smaller portion of the full-length sequence.
Also notice the '''Region''' range, which tell you how many of your query residues align with the hit. Compare this to the full length of your query sequence.


<font color='red'>'''BEWARE!'''</font> The sequence identity percentage may be '''underestimated''' at pdb.org. This happens when pdb.org deems segments of the query sequence to be of low complexity. Such segments are marked with X’s in the sequence alignment, and excluded from the calculation of sequence identity. For example, for Saccharomyces gal4 (UniProt P04386), for the top hit (3coq), pdb.org reports “Identities: 71/89 (80%)”, while in fact the sequence identity is 100%. Note this in the sequence alignment at pdb.org:
If you click the ''Download'' button in the list of hits, you will get the CIF file. If you need [[PDB file format]], click on the PDB ID code and open the ''Download'' menu on that single entry page to get all format options.
<!--For example, “Identities: 355/1045 (34%)” means that 1,045 residues of your query sequence align to the hit with 34% sequence identity (355 identical residues in the alignment). Knowing that my query had length 1,170 residues, I can see that this potential template for a homology model would enable me to model 1,045/1,170 = 89% of my query sequence. Quite often the alignment would span a much smaller portion of the full-length sequence.


<font color='red'>'''BEWARE!''' If you forgot to set <i>Mask Low Complexity</i> to NO:</font> The sequence identity percentage may be '''underestimated''' at rcsb.org. This happens when rcsb.org deems segments of the query sequence to be of low complexity. Such segments are marked with X’s in the sequence alignment, and excluded from the calculation of sequence identity. For example, for Saccharomyces gal4 (UniProt P04386), for the top hit (3coq), rcsb.org reports “Identities: 71/89 (80%)”, while in fact the sequence identity is 100%. Note this in the sequence alignment at rcsb.org:
[[Image:Seq-algn-lo-complexity.png|center]]
[[Image:Seq-algn-lo-complexity.png|center]]


The 18 residues marked X were not included in the identity calculation. In contrast, when the same sequence search is performed at [http://www.ebi.ac.uk/pdbe PDB-Europe], 100% sequence identity is reported. However, other aspects of the report at PDB-Europe are less satisfactory (e.g. the length of the alignment is not stated; the sequences are not numbered) and hence we recommend using pdb.org despite its misleading sequence identity percentages. But you may certainly want to run the sequence search at [http://www.ebi.ac.uk/pdbe PDB-Europe] to compare the reported identity percentages.
The 18 residues marked X were not included in the identity calculation. In contrast, when the same sequence search is performed at [http://www.ebi.ac.uk/pdbe PDB-Europe], 100% sequence identity is reported. However, other aspects of the report at PDB-Europe are less satisfactory (e.g. the length of the alignment is not stated; the sequences are not numbered) and hence we recommend using rcsb.org despite its misleading sequence identity percentages.-->


== Are parts (or all) of the query protein intrinsically disordered? ==
== Are parts (or all) of the query protein intrinsically disordered? ==


Attempts to determine structure for intrinsically disordered protein will be futile. Therefore, before considering homology modeling or crystallization experiments, it is important to predict whether portions of the query protein are likely to be intrinsically disordered.
Attempts to determine structure for [[Intrinsically Disordered Protein|intrinsically disordered]] protein will be futile. Therefore, before considering homology modeling or crystallization experiments, it is important to predict whether portions of the query protein are likely to be intrinsically disordered.


Although fold is required for the function of most proteins, some proteins are [[Intrinsically Disordered Protein|intrinsically disordered]] (natively unstructured) and do not fold, at least by themselves. Often, intrinsically disordered protein transitions to an ordered state when it binds to a folded partner protein. However some proteins remain disordered while performing their functions.
Although fold is required for the function of most proteins, some proteins are [[Intrinsically Disordered Protein|intrinsically disordered]] (natively unstructured) and do not fold, at least by themselves. Often, intrinsically disordered protein transitions to an ordered state when it binds to a folded partner protein. However some proteins remain disordered while performing their functions.


By some estimates, 10% of proteins are intrinsically disordered for their full lengths, and about 40% of eukaryotic proteins have at least one loop 50 residues or longer that is intrinsically disordered. These disordered loops are typically missing from X-ray crystallographic structures because the disorder blurs that portion of the electron density map.
By some estimates, 10% of proteins are intrinsically disordered for their full lengths, and about 40% of eukaryotic proteins have at least one loop 50 residues or longer that is intrinsically disordered<ref>PMID:12368089</ref>. These disordered loops are typically missing from X-ray crystallographic structures because the disorder blurs that portion of the electron density map.


Examples:
Examples:
Line 80: Line 95:
=== Prediction of intrinsic disorder ===
=== Prediction of intrinsic disorder ===


==== MobiDB via UniProt ====
==== MobiDB ====


At UniProt.Org, find your protein, then click on “Structure”. At the bottom of this section is usually a link to MobiDB’s report for the query protein. There, in the section ''Detailed Disorder Annotations'' are graphics showing experimental evidence for disorder (if available) and, under the heading ''Predictors'', results from several servers designed to predict intrinsic disorder.
MobiDB is a meta-server: it summarizes disorder predictions from various other servers that use different methods.


The ''Examples'' above are linked to MobiDB.
*At UniProt.Org, find your protein, then copy its UniProt accession code, something like P04386.
*Go to [https://mobidb.bio.unipd.it/ MobiDB].
*Enter your UniProt accession code, such as P04386. Do NOT include for example (GAL4_YEAST) or it will say "not found".
 
In 2017, MobiDB changed its output format, and it is rather confusing. There is no color key and the results are poorly explained, if at all. If you know of a better meta-server, please mention it in the discussion page. You may find [https://www.umass.edu/molvis/workshop/mobidb.htm these instructions helpful]<ref>The [https://www.umass.edu/molvis/workshop/mobidb.htm MobiDB instructions] were designed to supplement Section 18 of [https://www.umass.edu/molvis/workshop/challeng/2018/pps-um18.htm this assignment].</ref>.


==== FoldIndex ====
==== FoldIndex ====


The [http://bip.weizmann.ac.il/fldbin/findex FoldIndex server] is a useful adjunct to the MobiDB report, since it is not included in that report.
The [https://fold.proteopedia.org FoldIndex server] is a useful adjunct to the MobiDB report, since it is not included in that report.


==Is your query protein in the structural genomics pipeline?==
==Is your query protein in the structural genomics pipeline?==
Line 102: Line 121:
'''Full-length templates are unlikely to be found''' for larger proteins (>~200 residues). 89% of structures in the [[Protein Data Bank]]  were determined by [[X-ray crystallography]]. Most crystallographic structures represent fragments of full-length proteins, because fragments generally give higher crystallization success<ref>The overall success rate for solving the 3D structure of a given protein sequence is about 5%. Failures commonly occur because the expressed protein is not sufficiently soluble (about half of expressed sequences), because soluble proteins fail to crystallize, or because crystals are not well ordered.</ref>. 10% of structures in the Protein Data Bank were determined by solution [[NMR]], but these tend to be small proteins or single domains. The median molecular mass of structures determined by NMR is 10 KD<ref name="mmm">Median molecular masses in the PDB were determined in December, 2014.</ref> (about 90 amino acids<ref>The average mass of an amino acid is 111.4 Daltons, weighted according to the frequencies of occurrences in proteins.</ref>). NMR is generally not able to determine atomic resolution structures for proteins >30 KD.
'''Full-length templates are unlikely to be found''' for larger proteins (>~200 residues). 89% of structures in the [[Protein Data Bank]]  were determined by [[X-ray crystallography]]. Most crystallographic structures represent fragments of full-length proteins, because fragments generally give higher crystallization success<ref>The overall success rate for solving the 3D structure of a given protein sequence is about 5%. Failures commonly occur because the expressed protein is not sufficiently soluble (about half of expressed sequences), because soluble proteins fail to crystallize, or because crystals are not well ordered.</ref>. 10% of structures in the Protein Data Bank were determined by solution [[NMR]], but these tend to be small proteins or single domains. The median molecular mass of structures determined by NMR is 10 KD<ref name="mmm">Median molecular masses in the PDB were determined in December, 2014.</ref> (about 90 amino acids<ref>The average mass of an amino acid is 111.4 Daltons, weighted according to the frequencies of occurrences in proteins.</ref>). NMR is generally not able to determine atomic resolution structures for proteins >30 KD.


In contrast, the median molecular mass of [[asymmetric units]]  determined by X-ray crystallography is 50 KD<ref name="mmm" />, and a few are very large, such as virus capsids (e.g. 4qyk, ~2 million Daltons; 4v99, 10 million Daltons) or ribosomes (e.g. 4w2i, 4.5 million Daltons).
In contrast, the median molecular mass of [[Asymmetric unit|asymmetric units]]  determined by X-ray crystallography is 50 KD<ref name="mmm" />, and a few are very large, such as virus capsids (e.g. 4qyk, ~2 million Daltons; 4v99, 10 million Daltons) or ribosomes (e.g. 4w2i, 4.5 million Daltons).


===Errors and uncertainties in the sequence alignment produce errors in the homology model===
===Errors and uncertainties in the sequence alignment produce errors in the homology model===


The quality of a homology model depends upon the quality of the alignment between the query and template sequences. When the sequence identity '''falls below 35%''', the chances increase for errors in the alignment. Errors in the sequence alignment result in errors in positioning the query residues on the template fold; that is, errors in the 3D model.
The quality of a homology model depends upon the quality of the alignment between the query and template sequences. When the sequence identity '''falls below about 35%''', the chances increase for errors in the alignment. Errors in the sequence alignment result in errors in positioning the query residues on the template fold; that is, errors in the 3D model.


'''Gaps''' in the sequence alignment make errors in the model. Gaps are opened in a sequence alignment in order to optimize the alignment. Such gaps may be regarded as insertions or deletions, but since it is usually unclear which, these are commonly called by the noncommittal term ''indels''. The presence of large numbers of gapped residues in a sequence alignment guarantees that there will be errors in the homology model: missing residues, or residues in incorrect positions.
'''Gaps''' in the sequence alignment make errors in the model. Gaps are opened in a sequence alignment in order to optimize the alignment. Such gaps may be regarded as insertions or deletions, but since it is usually unclear which, these are commonly called by the noncommittal term ''indels''. The presence of large numbers of gapped residues in a sequence alignment guarantees that there will be errors in the homology model: missing residues, or residues in incorrect positions.


:A '''gap in the template sequence''' means that the aligned portion of the query is untemplated. Different homology modeling servers handle this differently. Swiss-Model includes the untemplated query residues, putting them in a loop (which may extend some distance awat from the remainder of the domain when the loop is long).
:A '''gap in the template sequence''' means that the corresponding portion of the query is untemplated. Different [[homology modeling servers]] handle this differently. Swiss-Model includes the untemplated query residues, putting them in a loop (which may extend some distance away from the remainder of the domain when the loop is long).


:A '''gap in the query sequence''' means that the two residues flanking the gap must be peptide-bonded in the 3D model, yet the aligned template residues may not be close to each other.
:A '''gap in the query sequence''' means that the two residues flanking the gap will usually be peptide-bonded in the 3D model, yet the aligned template residues may not be close to each other.


Templates determined by crystallography often have '''missing residues'''. [[FirstGlance in Jmol]] reports missing residues and marks their locations clearly. Missing residues have no coordinates in the crystallographic model due to disorder of those residues in the crystal. Thus, even though the sequences may align, some residues are absent in the 3D template, and it is unclear where to position those residues. Some [[homology modeling servers]] omit such residues entirely, producing an incomplete homology model.
Templates determined by crystallography often have '''missing residues'''. [[FirstGlance in Jmol]] reports missing residues and marks their locations clearly. Missing residues have no coordinates in the crystallographic model due to disorder of those residues in the crystal. Thus, even though the sequences may align, some residues are frequently absent in the 3D template, and it is unclear where to position those residues. Some [[homology modeling servers]] omit such residues entirely, producing an incomplete homology model.


===Sidechain rotamer positions will be incorrect===
===Sidechain rotamer positions will be incorrect===


Even when the sequence alignment and template result in a correct backbone fold for the homology model, the sidechain rotamer positions will be incorrect. Despite knowing where each alpha carbon atom is located, theory does not correctly predict how the sidechains will fit together. At best, the sidechain rotamer positions will avoid steric clashes and electrostatic repulsions of like charges, and may optimize some salt bridges and hydrogen bonds. However, when a high quality empirical model becomes available, the details of sidechain packing in the homology model will be shown to be incorrect.
Even when the sequence alignment and template result in a correct backbone fold for the homology model, the sidechain rotamer positions (orientations relative to the alpha carbon position) will be incorrect. Despite knowing where each alpha carbon atom is located, theory does not correctly predict how the sidechains will fit together. At best, the sidechain rotamer positions will avoid steric clashes and electrostatic repulsions of like charges, and may optimize some salt bridges and hydrogen bonds. However, when a high quality empirical model becomes available, the details of sidechain packing in the homology model will be shown to be incorrect.


==Strengths of Homology Models==
==Strengths of Homology Models==


Given the limitations explained above, you might well wonder whether homology models have any uses. Provided that the sequence alignment is reliable (35% identity or more), the backbone fold is likely to be correct. This provides a great deal of information despite the inaccuracies in sidechain positions.
Given the limitations explained above, you might well wonder whether homology models have any uses. Provided that the sequence alignment is reliable (about 35% identity or more), and if the sequence alignment lacks numerous or large gaps (indels), the backbone fold is likely to be correct. This provides a great deal of information despite the inaccuracies in sidechain positions.


*The model indicates which residues are on the '''surface''' and which are '''buried'''.  
*The model suggests which residues are on the '''surface''' and which are '''buried'''.  


*If mutagenesis studies have shown phenotypic changes, it will be useful to see where the crucial residues lie in the homology model.
*If mutagenesis studies have shown phenotypic changes, it will be useful to see where the crucial residues lie in the homology model.
Line 136: Line 155:
==How to obtain homology models==
==How to obtain homology models==


===Pre-Calculated Models===
===Pre-calculated Models===


At UniProt.Org, find your protein and click on ''Structure''.
At UniProt.Org, find your protein and click on ''Structure''.
Line 142: Line 161:
====Protein Model Portal====
====Protein Model Portal====


Under the subheading ''3D Structure Databases'', click on the linked UniProt ID at ProteinModelPortal. Here you will find bar graphics showing the coverage by pre-calculated homology models. Touching the blue bars reports the sequence range for each model.
The ProteinModelPortal has been shut down. The webpage merely remains to serve as a relay to established resources pre-calculating protein structure models.  
 
Below is a table listing sequence ranges and percentages of sequence identity. Clicking on '''<nowiki>[Show]</nowiki>''' give you a report with a link to download the homology model.
 
Notice the section at the bottom of the page ''Remodel this protein''. This is a good option if you don't find a satisfactory model.


====SMR: Swiss Model Repository====
====SMR: Swiss Model Repository====


This give you similar coverage graphics, but limited to models generated by Swiss Model. Clicking on any one blue graphic bar shows details below, including links to download the model.
SMR will display arc bar graphics depicting the structural coverage of pre-calculated homology models and experimental structure for a given UniProt entry side-by-side. Clicking the bars and then hovering reports model details e.g. sequence range for each model. Links to download the models are offered in a separate paragraph.


====ModBase====
====ModBase====
Line 168: Line 183:
*At [http://uniprot.org UniProt.Org], find your sequence, and copy it in FASTA format.
*At [http://uniprot.org UniProt.Org], find your sequence, and copy it in FASTA format.


*Go to [http://swissmodel.expasy.org SwissModel.expasy.org].
*Go to [http://swissmodel.expasy.org SwissModel.expasy.org]<ref name="promod3">PMID: 33507980</ref>.


*It is a good idea to create an account, and login. This makes it easy to find your models later, although they are not kept on the server more than a week.
*It is a good idea to create an account, and login. This makes it easy to find your models later, although they are not kept on the server more than a week.
Line 190: Line 205:
''FirstGlance in Jmol'' is perhaps easiest to use, has a great deal of help for interpreting what you see, and is nevertheless quite powerful. (See [http://bioinformatics.org/firstglance/fgij/whatis.htm What Is FirstGlance in Jmol?] and [[FirstGlance in Jmol]].)
''FirstGlance in Jmol'' is perhaps easiest to use, has a great deal of help for interpreting what you see, and is nevertheless quite powerful. (See [http://bioinformatics.org/firstglance/fgij/whatis.htm What Is FirstGlance in Jmol?] and [[FirstGlance in Jmol]].)


*Download your homology model(s).
*Empirical model (PDB code)
* Go to [http://firstglance.jmol.org FirstGlance.Jmol.Org].
** Write down the PDB code of interest.
* Click on ''Upload your own PDB file'' and designate your homology model. Click ''View in FirstGlance''. Your molecule should appear momentarily.
** Go to [http://firstglance.jmol.org FirstGlance.Jmol.Org].
** Enter your PDB code in the slot.
 
 
* Homology Model
** Download your homology model(s).
** Go to [http://firstglance.jmol.org FirstGlance.Jmol.Org].
** Click on ''Upload your own PDB file'' and designate your homology model. Click ''View in FirstGlance''. Your molecule should appear momentarily.


====Hydrophobic/Polar====
====Hydrophobic/Polar====
Line 199: Line 221:


<center>
<center>
<span style="font-size:150%;">{{Template:ColorKey_Hydrophobic}},  {{Template:ColorKey_Polar}}</span>
{| class="wikitable"
<table style="border-style:solid;"><tr><td>
|-
! colspan="3" | <span style="font-size:150%;">{{Template:ColorKey_Hydrophobic}},  {{Template:ColorKey_Polar}}</span>
|-
| [[Image:Hydrophilic-surface.png]]
| [[Image:Hydrophobic-surface.png]]
| [[Image:Transmembrane-surface.png]]
|-
| style="width:218px;" | Hydrophilic surface of a homology model.
| style="width:238px;" | Hydrophobic catalytic face of lipase ([[1lpm]]).
| style="width:230px;" | Transmembrane protein ([[3waj]]) Transmembrane hydrophobic zone is indicated by the <font color='red'>'''red'''</font> bracket.
|}
 
</center>


</td><td>
====Hydrophobic Core====
 
Soluble proteins should have a well-defined hydrophobic core. To see this in ''FirstGlance'', under the ''Views'' tab, click ''Hydrophobic/Polar'', and then turn on the ''Slab'' button. If the protein has multiple domains, each domain should have a hydrophobic core. If there is no hydrophobic core in a soluble protein model, the model most likely has very substantial errors.
<center>
{| class="wikitable"
|-
! colspan="1" | <span style="font-size:150%;">{{Template:ColorKey_Hydrophobic}},  {{Template:ColorKey_Polar}}</span>
|-
| [[Image:Hydrophobic-cores.png]]
|-
| Hydrophobic cores in domains (circled in <font color="red">'''red'''</font>; [[4cpa]]).
|}


</td></tr><tr><td>
Hydrophilic surface of a protein (1lpm).
</td><td>
Hydrophobic catalytic face of lipase (1lpm).
</td></tr></table>
</center>
</center>
====Charge Distribution====
===Evolutionary Conservation===
Patches of highly conserved amino acids in a homology model can be very informative, as such patches indicate functional sites.
*Go to the ConSurf Server: [http://consurf.tau.ac.il ConSurf.tau.ac.il].
*Click ''Amino Acids''.
*Click YES there is a known protein structure.
*Enter your PDB code, or click ''Choose File'' to upload a homology model. Click ''Next''.
*Select the chain of interest. For a homology model, there will usually be only one chain, "A".
*Select NO you have not prepared a Multiple Sequence Alignment (MSA) that you wish to upload. The server will generate the MSA for you.
*Leave parameters at their defaults.
*Check '''manually''' for "Select homologs ...".
*Enter a job title and your email address, then click the ''Submit'' button. The first step, gathering similar sequences, typically takes less than 5 minutes.
*When the sequences are gathered, you will see <font color="x00b000">'''SELECT SEQUENCES'''</font>.
*Continue as explained here: [[ConSurfDB_vs._ConSurf#Limiting_ConSurf_Analysis_to_Proteins_of_a_Single_Function]].


==See Also==
==See Also==

Proteopedia Page Contributors and Editors (what is this?)Proteopedia Page Contributors and Editors (what is this?)

Eric Martz, Juergen Haas, Jaime Prilusky