Practical Guide to Homology Modeling: Difference between revisions

Eric Martz (talk | contribs)
Eric Martz (talk | contribs)
No edit summary
Line 24: Line 24:
=== Is there an empirical model? ===
=== Is there an empirical model? ===


All published, empirically-determined, atomic-resolution, macromolecular 3D structures are available in the [[Protein Data Bank]] (PDB, pdb.org).
All published, empirically-determined, atomic-resolution, macromolecular 3D structures are available in the [[Protein Data Bank]] (PDB, rcsb.org).


Each model in the PDB has a unique 4-character identification code ([[PDB ID]]) that begins with a numeral, and has letters or numerals for the last 3 characters . Examples are 1d66, 4mdh, 9ins.
Each model in the PDB has a unique 4-character identification code ([[PDB ID]]) that begins with a numeral, and has letters or numerals for the last 3 characters . Examples are 1d66, 4mdh, 9ins.
Line 62: Line 62:
For example, “Identities: 355/1045 (34%)” means that 1,045 residues of your query sequence align to the hit with 34% sequence identity (355 identical residues in the alignment). Knowing that my query had length 1,170 residues, I can see that this potential template for a homology model would enable me to model 1,045/1,170 = 89% of my query sequence. Quite often the alignment would span a much smaller portion of the full-length sequence.
For example, “Identities: 355/1045 (34%)” means that 1,045 residues of your query sequence align to the hit with 34% sequence identity (355 identical residues in the alignment). Knowing that my query had length 1,170 residues, I can see that this potential template for a homology model would enable me to model 1,045/1,170 = 89% of my query sequence. Quite often the alignment would span a much smaller portion of the full-length sequence.


<font color='red'>'''BEWARE!''' If you forgot to set <i>Mask Low Complexity</i> to NO:</font> The sequence identity percentage may be '''underestimated''' at pdb.org. This happens when pdb.org deems segments of the query sequence to be of low complexity. Such segments are marked with X’s in the sequence alignment, and excluded from the calculation of sequence identity. For example, for Saccharomyces gal4 (UniProt P04386), for the top hit (3coq), pdb.org reports “Identities: 71/89 (80%)”, while in fact the sequence identity is 100%. Note this in the sequence alignment at pdb.org:
<font color='red'>'''BEWARE!''' If you forgot to set <i>Mask Low Complexity</i> to NO:</font> The sequence identity percentage may be '''underestimated''' at rcsb.org. This happens when rcsb.org deems segments of the query sequence to be of low complexity. Such segments are marked with X’s in the sequence alignment, and excluded from the calculation of sequence identity. For example, for Saccharomyces gal4 (UniProt P04386), for the top hit (3coq), rcsb.org reports “Identities: 71/89 (80%)”, while in fact the sequence identity is 100%. Note this in the sequence alignment at rcsb.org:


[[Image:Seq-algn-lo-complexity.png|center]]
[[Image:Seq-algn-lo-complexity.png|center]]


The 18 residues marked X were not included in the identity calculation. In contrast, when the same sequence search is performed at [http://www.ebi.ac.uk/pdbe PDB-Europe], 100% sequence identity is reported. However, other aspects of the report at PDB-Europe are less satisfactory (e.g. the length of the alignment is not stated; the sequences are not numbered) and hence we recommend using pdb.org despite its misleading sequence identity percentages.
The 18 residues marked X were not included in the identity calculation. In contrast, when the same sequence search is performed at [http://www.ebi.ac.uk/pdbe PDB-Europe], 100% sequence identity is reported. However, other aspects of the report at PDB-Europe are less satisfactory (e.g. the length of the alignment is not stated; the sequences are not numbered) and hence we recommend using rcsb.org despite its misleading sequence identity percentages.


== Are parts (or all) of the query protein intrinsically disordered? ==
== Are parts (or all) of the query protein intrinsically disordered? ==

Proteopedia Page Contributors and Editors (what is this?)Proteopedia Page Contributors and Editors (what is this?)

Eric Martz, Juergen Haas, Jaime Prilusky