Unusual sequence numbering: Difference between revisions

Eric Martz (talk | contribs)
Eric Martz (talk | contribs)
No edit summary
 
(9 intermediate revisions by the same user not shown)
Line 1: Line 1:
The numbering of protein and nucleic acid sequences is arbitrary in structure files from the [[PDB|World Wide Protein Data Bank]] (PDB). That is, authors are free to number sequences as they wish.
The numbering of protein and nucleic acid sequences is arbitrary in structure files from the [[PDB|World Wide Protein Data Bank]] (PDB). That is, authors are free to number sequences as they wish. If you need to change the numbering in a published [[PDB file]], please see [[Renumbering PDB files]].


'''Straightforward numbering''' assigns 1 to the amino-terminal amino acid (or 5' nucleotide), and counts up sequentially and monotonically to the carboxy-terminal amino acid (or 3' nucleotide). An example is [http://firstglance.jmol.org/fg.htm?mol=1pgb 1pgb] ([[1pgb]]). The crystallized protein is numbered 1-56, despite it being a fragment of a [http://www.uniprot.org/uniprot/P06654#sequences 448-residue full length sequence] that begins (after adding an N-terminal Met) at full-length sequence number 228.
'''Straightforward numbering''' assigns 1 to the amino-terminal amino acid (or 5' nucleotide), and counts up sequentially and monotonically to the carboxy-terminal amino acid (or 3' nucleotide). An example is [http://firstglance.jmol.org/fg.htm?mol=1pgb 1pgb] ([[1pgb]]). The crystallized protein is numbered 1-56, despite it being a fragment of a [http://www.uniprot.org/uniprot/P06654#sequences 448-residue full length sequence] that begins (after adding an N-terminal Met) at full-length sequence number 228.
Line 6: Line 6:


==Numbering Does Not Start With One==
==Numbering Does Not Start With One==
===Arbitrary Numbering===
[[1bsz]] contains three sequence-identical chains numbered 1-168, 501-668, and 1001-1168.
===N-Terminal Residues Missing Coordinates===
===N-Terminal Residues Missing Coordinates===
Probably the most common reason that the first residue with coordinates is not numbered 1 is because the N-terminal (or 5'-terminal) residues are missing coordinates due to crystallographic disorder (fuzzy electron density map). An example is [http://firstglance.jmol.org/fg.htm?mol=1d66 1d66] ([[1d66]]). The first 7 residues of chain A are missing, so the first residue with coordinates is numbered 8. 1-7 were present in the crystallized protein, but could not be resolved in the electron density map.
Probably the most common reason that the first residue with coordinates is not numbered 1 is because the N-terminal (or 5'-terminal) residues are missing coordinates due to crystallographic disorder (fuzzy electron density map). An example is [http://firstglance.jmol.org/fg.htm?mol=1d66 1d66] ([[1d66]]). The first 7 residues of chain A are missing, so the first residue with coordinates is numbered 8. 1-7 were present in the crystallized protein, but could not be resolved in the electron density map.
Line 41: Line 44:
===Missing Residues===
===Missing Residues===
[[Image:Sequence-missing-loop-2ace.png|frame|Excerpt from PDB file 2ace showing gap in sequence numbering due to a missing loop.]]
[[Image:Sequence-missing-loop-2ace.png|frame|Excerpt from PDB file 2ace showing gap in sequence numbering due to a missing loop.]]
It is not uncommon for a surface loop of the crystallized protein to be disordered. Often such loops are [[Intrinsically Disordered Protein|intrinsically disordered]]. The disorder blurs the electron density map for that loop, and the loop residues are not given coordinates in the model: they are missing in the model. However, they were not missing in the crystallized protein. This causes a gap in the sequence numbers in the PDB file. An example is [http://firstglance.jmol.org/fg.htm?mol=2ace 2ace] ([[2ace]]). Residues 485-489 are missing in the 3D crystallographic model due to disorder in the crystal. Also missing are 3 N-terminal, and 2 C-terminal residues.  FirstGlance in Jmol tabulates missing residues, and marks regions of the 3D model where residues are missing with "empty baskets".
It is not uncommon for a surface loop of the crystallized protein to be disordered. Often such loops are [[Intrinsically Disordered Protein|intrinsically disordered]]. The disorder blurs the electron density map for that loop, and the loop residues are not given coordinates in the model: they are [[Missing residues and incomplete sidechains|missing in the model]]. However, they were not missing in the crystallized protein. This causes a gap in the sequence numbers in the PDB file. An example is [http://firstglance.jmol.org/fg.htm?mol=2ace 2ace] ([[2ace]]). Residues 485-489 are missing in the 3D crystallographic model due to disorder in the crystal. Also missing are 3 N-terminal, and 2 C-terminal residues.  FirstGlance in Jmol tabulates missing residues, and marks regions of the 3D model where residues are missing with "empty baskets".


<table width=550><tr><td>[[Image:2ace-empty-basket.png|center]]</td><td>&quot;Empty Basket&quot;: Closeup of the region of [[2ace]] where residues 485-489 are missing. In [[FirstGlance in Jmol]], empty baskets alert the user to missing residues. (&quot;S-&quot; labels residues with missing sidechain atoms.)
<br><br>
See also [[Missing residues and incomplete sidechains]].</td></tr></table>
{{clear}}
{{clear}}


Line 52: Line 58:
</center>
</center>


Another example is chain R in [http://firstglance.jmol.org/fg.htm?mol=3sn6 3sn6] ([[3sn6]]). It is numbered 1002-1164 continuing 30-365. However the model lacks bonds between 1164 and 30 because amino acids 1161-1164 are missing due to crystallographic disorder.
Other examples:
*[http://firstglance.jmol.org/fg.htm?mol=1nsa 1nsa] ([[1nsa]]) is numbered 7A-95A ("A" being an insertion code) continuing 4-308. There is also 188A inserted between 188 and 189.
*Chain R in [http://firstglance.jmol.org/fg.htm?mol=3sn6 3sn6] ([[3sn6]]). It is numbered 1002-1164 continuing 30-365. However the model lacks bonds between 1164 and 30 because amino acids 1161-1164 are missing due to crystallographic disorder.


== Notes ==
== Notes ==
<references/>
<references/>
==See Also==
*[[Renumbering PDB files]]
*[[Missing residues and incomplete sidechains]]

Proteopedia Page Contributors and Editors (what is this?)Proteopedia Page Contributors and Editors (what is this?)

Eric Martz