Practical Guide to Homology Modeling: Difference between revisions

← Older edit

Proteopedia Page Contributors and Editors (what is this?)Proteopedia Page Contributors and Editors (what is this?)

Eric Martz, Juergen Haas, Jaime Prilusky

@@ Line 1: / Line 1: @@
-<table style="background:#ffff80;"><tr><td>Many assertions in this article are lacking literature citations. Help improving documentation in this article will be appreciated. Wikipedia's article on [http://en.wikipedia.org/wiki/Homology_modeling Homology modeling] is well documented, although more technical and less of a practical guide than the present article.</td></tr></table>
+<table style="background:#ffff80;"><tr><td>
+<big>Homology modeling has become largely obsolete since the 2020 success of structure prediction by [[AlphaFold]] and other AI prediction systems. Rather than starting here, we suggest starting at [[How To Find A Structure]].
+</big>
+<br><br>
+Many assertions in this article are lacking literature citations. Help improving documentation in this article will be appreciated. Wikipedia's article on [http://en.wikipedia.org/wiki/Homology_modeling Homology modeling] is well documented, although more technical and less of a practical guide than the present article.</td></tr></table>
 == Terminology ==
@@ Line 21: / Line 25: @@
 You don’t need a homology model if the amino acid sequence of interest (the query sequence) already has an empirically determined 3D structure. Structures determined empirically, by X-ray crystallography or (much less often) by solution NMR or cryo-EM, will almost always be more accurate than a homology model.
+If [[AlphaFold]] has predicted a model for your amino acid sequence of interest, it will often be more accurate than a homology model, and in most cases, a homology model won't be possible due to lack of a suitable template.
+=== Has AlphaFold predicted a model? ===
+Empirical models are the most reliable, but if none are available, [[AlphaFold]] has an impressive track record of correctly predicting structures from sequence. Check the [http://alphafold.ebi.ac.uk AlphaFold Database] for a model of your protein of interest. You can also submit a sequence and get a prediction: [[How to predict structures with AlphaFold]]. Another model prediction service with a good track record is [http://robetta.bakerlab.org RoseTTaFold]. Submit your sequence there, making sure to check ''RoseTTaFold'' as the method. With any of these methods, download the predicted [[PDB file]] and then upload it to [http://firstglance.jmol.org FirstGlance in Jmol] for exploration and analysis. FirstGlance automatically colors predicted models by reliability.
 === Is there an empirical model? ===
-All published, empirically-determined, atomic-resolution, macromolecular 3D structures are available in the [[Protein Data Bank]] (PDB, pdb.org).
+[[Empirical models|Empirically-determined]] models are usually the most reliable. All published, empirically-determined, atomic-resolution, macromolecular 3D structures are available in the [[World Wide Protein Data Bank]].
 Each model in the PDB has a unique 4-character identification code ([[PDB ID]]) that begins with a numeral, and has letters or numerals for the last 3 characters . Examples are 1d66, 4mdh, 9ins.
-Here are two methods for finding out if your query amino acid sequence, or parts of it, have empirically-determined 3D structures in the PDB.
+Here are two methods for finding out if your query amino acid sequence, or parts of it, have [[Empirical models|empirically-determined 3D structures]] in the PDB.
 ==== Simple search for empirical models (via PIR) ====
@@ Line 52: / Line 61: @@
 #Note the '''length''' of your sequence.
 #At [http://rcsb.org rcsb.org], go to Advanced Search.
-#Click on “Choose a query type” and select Sequence under “Sequence Features”.
+#Select '''Sequence''' under 'Advanced Search Query Builder'.
-#Paste your query sequence into the large box.
+#Paste your query sequence into the box.
-#<font color="red">Set <i>Mask low complexity</i> to No.</font>
+#Push the [[Image:Rcsb-search-button.png]] button to run the search.
-#Click the “Submit Query” button at the lower right of the search interface box.
+#Scroll down to see the list of hits.
-#The best hits will be listed first, starting below “Showing 1-25 of NNN”.  Notice that each hit starts with a large, bold PDB ID.
+#At the top of the list, change <font color='red'>Display Results as</font> to <font color='red'>'''Polymer Entities'''</font>. Then push [[Image:Rcsb-search-button.png]] again. <font color='red'>''This is crucial''</font> because it displays the identity percentages and alignments for the hits. It should be the default!
+#The best hits will be listed first.  Notice that each hit starts with a large, bold PDB ID.
-For each hit, notice the “Identities” above the sequence alignment box. The denominator tells you the length of the sequence alignment. The percentage tells you the sequence identity of the alignment.
+For each hit, notice the '''Sequence Identity %''' above the sequence alignment box.
-For example, “Identities: 355/1045 (34%)” means that 1,045 residues of your query sequence align to the hit with 34% sequence identity (355 identical residues in the alignment). Knowing that my query had length 1,170 residues, I can see that this potential template for a homology model would enable me to model 1,045/1,170 = 89% of my query sequence. Quite often the alignment would span a much smaller portion of the full-length sequence.
+Also notice the '''Region''' range, which tell you how many of your query residues align with the hit. Compare this to the full length of your query sequence.
-<font color='red'>'''BEWARE!''' If you forgot to set <i>Mask Low Complexity</i> to NO:</font> The sequence identity percentage may be '''underestimated''' at pdb.org. This happens when pdb.org deems segments of the query sequence to be of low complexity. Such segments are marked with X’s in the sequence alignment, and excluded from the calculation of sequence identity. For example, for Saccharomyces gal4 (UniProt P04386), for the top hit (3coq), pdb.org reports “Identities: 71/89 (80%)”, while in fact the sequence identity is 100%. Note this in the sequence alignment at pdb.org:
+If you click the ''Download'' button in the list of hits, you will get the CIF file. If you need [[PDB file format]], click on the PDB ID code and open the ''Download'' menu on that single entry page to get all format options.
+<!--For example, “Identities: 355/1045 (34%)” means that 1,045 residues of your query sequence align to the hit with 34% sequence identity (355 identical residues in the alignment). Knowing that my query had length 1,170 residues, I can see that this potential template for a homology model would enable me to model 1,045/1,170 = 89% of my query sequence. Quite often the alignment would span a much smaller portion of the full-length sequence.
+<font color='red'>'''BEWARE!''' If you forgot to set <i>Mask Low Complexity</i> to NO:</font> The sequence identity percentage may be '''underestimated''' at rcsb.org. This happens when rcsb.org deems segments of the query sequence to be of low complexity. Such segments are marked with X’s in the sequence alignment, and excluded from the calculation of sequence identity. For example, for Saccharomyces gal4 (UniProt P04386), for the top hit (3coq), rcsb.org reports “Identities: 71/89 (80%)”, while in fact the sequence identity is 100%. Note this in the sequence alignment at rcsb.org:
 [[Image:Seq-algn-lo-complexity.png|center]]
-The 18 residues marked X were not included in the identity calculation. In contrast, when the same sequence search is performed at [http://www.ebi.ac.uk/pdbe PDB-Europe], 100% sequence identity is reported. However, other aspects of the report at PDB-Europe are less satisfactory (e.g. the length of the alignment is not stated; the sequences are not numbered) and hence we recommend using pdb.org despite its misleading sequence identity percentages.
+The 18 residues marked X were not included in the identity calculation. In contrast, when the same sequence search is performed at [http://www.ebi.ac.uk/pdbe PDB-Europe], 100% sequence identity is reported. However, other aspects of the report at PDB-Europe are less satisfactory (e.g. the length of the alignment is not stated; the sequences are not numbered) and hence we recommend using rcsb.org despite its misleading sequence identity percentages.-->
 == Are parts (or all) of the query protein intrinsically disordered? ==
@@ Line 83: / Line 95: @@
 === Prediction of intrinsic disorder ===
-==== MobiDB via UniProt ====
+==== MobiDB ====
-At UniProt.Org, find your protein, then click on “Structure”. At the bottom of this section is usually a link to MobiDB’s report for the query protein. There, in the section ''Detailed Disorder Annotations'' are graphics showing experimental evidence for disorder (if available) and, under the heading ''Predictors'', results from several servers designed to predict intrinsic disorder.
+MobiDB is a meta-server: it summarizes disorder predictions from various other servers that use different methods.
-The ''Examples'' above have links to MobiDB.
+*At UniProt.Org, find your protein, then copy its UniProt accession code, something like P04386.
+*Go to [https://mobidb.bio.unipd.it/ MobiDB].
+*Enter your UniProt accession code, such as P04386. Do NOT include for example (GAL4_YEAST) or it will say "not found".
+In 2017, MobiDB changed its output format, and it is rather confusing. There is no color key and the results are poorly explained, if at all. If you know of a better meta-server, please mention it in the discussion page. You may find [https://www.umass.edu/molvis/workshop/mobidb.htm these instructions helpful]<ref>The [https://www.umass.edu/molvis/workshop/mobidb.htm MobiDB instructions] were designed to supplement Section 18 of [https://www.umass.edu/molvis/workshop/challeng/2018/pps-um18.htm this assignment].</ref>.
 ==== FoldIndex ====
-The [http://bip.weizmann.ac.il/fldbin/findex FoldIndex server] is a useful adjunct to the MobiDB report, since it is not included in that report.
+The [https://fold.proteopedia.org FoldIndex server] is a useful adjunct to the MobiDB report, since it is not included in that report.
 ==Is your query protein in the structural genomics pipeline?==
@@ Line 139: / Line 155: @@
 ==How to obtain homology models==
-===Pre-Calculated Models===
+===Pre-calculated Models===
 At UniProt.Org, find your protein and click on ''Structure''.
@@ Line 145: / Line 161: @@
 ====Protein Model Portal====
-Under the subheading ''3D Structure Databases'', click on the linked UniProt ID at ProteinModelPortal. Here you will find bar graphics showing the coverage by pre-calculated homology models. Touching the blue bars reports the sequence range for each model.
+The ProteinModelPortal has been shut down. The webpage merely remains to serve as a relay to established resources pre-calculating protein structure models.
-Below is a table listing sequence ranges and percentages of sequence identity. Clicking on '''<nowiki>[Show]</nowiki>''' gives you a report with a link to download the homology model.
-* SWISSMODEL: use the '''<nowiki>[ download ]</nowiki>''' link.
-* <font color='red'>Important for MODBASE: Click on MODBASE</font> (not <nowiki>[ download ]</nowiki> which will give you a file not readable by FirstGlance). At the ModBase page, open the menu under <font color='green'>Perform action on this model</font> and select ''Coordinate File''. This will download a PDB file readable by FirstGlance.
-Notice the section at the bottom of the page ''Remodel this protein''. This is a good option if you don't find a satisfactory model.
 ====SMR: Swiss Model Repository====
-This give you similar coverage graphics, but limited to models generated by Swiss Model. Clicking on any one blue graphic bar shows details below, including links to download the model.
+SMR will display arc bar graphics depicting the structural coverage of pre-calculated homology models and experimental structure for a given UniProt entry side-by-side. Clicking the bars and then hovering reports model details e.g. sequence range for each model. Links to download the models are offered in a separate paragraph.
 ====ModBase====
@@ Line 173: / Line 183: @@
 *At [http://uniprot.org UniProt.Org], find your sequence, and copy it in FASTA format.
-*Go to [http://swissmodel.expasy.org SwissModel.expasy.org].
+*Go to [http://swissmodel.expasy.org SwissModel.expasy.org]<ref name="promod3">PMID: 33507980</ref>.
 *It is a good idea to create an account, and login. This makes it easy to find your models later, although they are not kept on the server more than a week.