Unusual sequence numbering: Difference between revisions
Eric Martz (talk | contribs) |
Eric Martz (talk | contribs) No edit summary |
||
(13 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
The numbering of protein and nucleic acid sequences is arbitrary in structure files from the [[PDB|World Wide Protein Data Bank]] (PDB). That is, authors are free to number sequences as they wish. | The numbering of protein and nucleic acid sequences is arbitrary in structure files from the [[PDB|World Wide Protein Data Bank]] (PDB). That is, authors are free to number sequences as they wish. If you need to change the numbering in a published [[PDB file]], please see [[Renumbering PDB files]]. | ||
'''Straightforward numbering''' assigns 1 to the amino-terminal amino acid (or 5' nucleotide), and counts up sequentially and monotonically to the carboxy-terminal amino acid (or 3' nucleotide). An example is [http://firstglance.jmol.org/fg.htm?mol=1pgb 1pgb] ([[1pgb]]). The crystallized protein is numbered 1-56, despite it being a fragment of a [http://www.uniprot.org/uniprot/P06654#sequences 448-residue full length sequence] that begins (after adding an N-terminal Met) at full-length sequence number 228. | '''Straightforward numbering''' assigns 1 to the amino-terminal amino acid (or 5' nucleotide), and counts up sequentially and monotonically to the carboxy-terminal amino acid (or 3' nucleotide). An example is [http://firstglance.jmol.org/fg.htm?mol=1pgb 1pgb] ([[1pgb]]). The crystallized protein is numbered 1-56, despite it being a fragment of a [http://www.uniprot.org/uniprot/P06654#sequences 448-residue full length sequence] that begins (after adding an N-terminal Met) at full-length sequence number 228. | ||
Line 6: | Line 6: | ||
==Numbering Does Not Start With One== | ==Numbering Does Not Start With One== | ||
===Arbitrary Numbering=== | |||
[[1bsz]] contains three sequence-identical chains numbered 1-168, 501-668, and 1001-1168. | |||
===N-Terminal Residues Missing Coordinates=== | ===N-Terminal Residues Missing Coordinates=== | ||
Probably the most common reason that the first residue with coordinates is not numbered 1 is because the N-terminal (or 5'-terminal) residues are missing coordinates due to crystallographic disorder (fuzzy electron density map). An example is [http://firstglance.jmol.org/fg.htm?mol=1d66 1d66] ([[1d66]]). The first 7 residues of chain A are missing, so the first residue with coordinates is numbered 8. 1-7 were present in the crystallized protein, but could not be resolved in the electron density map. | Probably the most common reason that the first residue with coordinates is not numbered 1 is because the N-terminal (or 5'-terminal) residues are missing coordinates due to crystallographic disorder (fuzzy electron density map). An example is [http://firstglance.jmol.org/fg.htm?mol=1d66 1d66] ([[1d66]]). The first 7 residues of chain A are missing, so the first residue with coordinates is numbered 8. 1-7 were present in the crystallized protein, but could not be resolved in the electron density map. | ||
Line 41: | Line 44: | ||
===Missing Residues=== | ===Missing Residues=== | ||
[[Image:Sequence-missing-loop-2ace.png|frame|Excerpt from PDB file 2ace showing gap in sequence numbering due to a missing loop.]] | [[Image:Sequence-missing-loop-2ace.png|frame|Excerpt from PDB file 2ace showing gap in sequence numbering due to a missing loop.]] | ||
It is not uncommon for a surface loop of the crystallized protein to be disordered. Often such loops are [[Intrinsically Disordered Protein|intrinsically disordered]]. The disorder blurs the electron density map for that loop, and the loop residues are not given coordinates in the model: they are missing in the model. However, they were not missing in the crystallized protein. This causes a gap in the sequence numbers in the PDB file. An example is [http://firstglance.jmol.org/fg.htm?mol=2ace 2ace] ([[2ace]]). Residues 485-489 are missing in the 3D crystallographic model due to disorder in the crystal. Also missing are 3 N-terminal, and 2 C-terminal residues. FirstGlance in Jmol tabulates missing residues, and marks regions of the 3D model where residues are missing with "empty baskets". | It is not uncommon for a surface loop of the crystallized protein to be disordered. Often such loops are [[Intrinsically Disordered Protein|intrinsically disordered]]. The disorder blurs the electron density map for that loop, and the loop residues are not given coordinates in the model: they are [[Missing residues and incomplete sidechains|missing in the model]]. However, they were not missing in the crystallized protein. This causes a gap in the sequence numbers in the PDB file. An example is [http://firstglance.jmol.org/fg.htm?mol=2ace 2ace] ([[2ace]]). Residues 485-489 are missing in the 3D crystallographic model due to disorder in the crystal. Also missing are 3 N-terminal, and 2 C-terminal residues. FirstGlance in Jmol tabulates missing residues, and marks regions of the 3D model where residues are missing with "empty baskets". | ||
<table width=550><tr><td>[[Image:2ace-empty-basket.png|center]]</td><td>"Empty Basket": Closeup of the region of [[2ace]] where residues 485-489 are missing. In [[FirstGlance in Jmol]], empty baskets alert the user to missing residues. ("S-" labels residues with missing sidechain atoms.) | |||
<br><br> | |||
See also [[Missing residues and incomplete sidechains]].</td></tr></table> | |||
{{clear}} | {{clear}} | ||
==Not Monotonic== | ==Not Monotonic== | ||
[[Image:Sequence-not-monotonic-4zwj.png|frame|Excerpt from PDB file 4zwj showing non-monotonic sequence numbering in chain A.]] | [[Image:Sequence-not-monotonic-4zwj.png|frame|Excerpt from PDB file 4zwj showing non-monotonic sequence numbering in chain A.]] | ||
Rarely, sequence numbers do not increase monotonically from N to C terminus. An example<ref>Thanks to Rachel Kramer Green of [[RCSB]] for this example.</ref> is [http://firstglance.jmol.org/fg.htm?mol=4zwj 4zwj] ([[4zwj]]). In this chimeric protein, chain A is numbered 1002-1161 continuing 1-326 continuing 2012-2361. That is, there are sudden jumps in numbering of consecutive amino acids: 1161 to 1, and 326 to 2012. At right is an excerpt from the ATOM records of the [[PDB file]] for 4zwj chain A. | Rarely, sequence numbers do not increase monotonically from N to C terminus. An example<ref>Thanks to Rachel Kramer Green of [[RCSB]] for this example.</ref> is [http://firstglance.jmol.org/fg.htm?mol=4zwj 4zwj] ([[4zwj]]). In this chimeric protein, chain A is numbered 1002-1161 continuing 1-326 continuing 2012-2361. That is, there are sudden jumps in numbering of consecutive amino acids: 1161 to 1, and 326 to 2012. At right is an excerpt from the ATOM records of the [[PDB file]] for 4zwj chain A. Below is a snapshot of the non-monotonic numbering. | ||
<center> | <center> | ||
<table width= | <table width=350><tr><td>[[Image:Not-monotonic-3sn6.png]]</td></tr><tr><td>Eight amino acids from 4zwj displayed with sequence numbers in FirstGlance in Jmol.<ref name="how2">Display 4zwj in FirstGlance in Jmol. Click ''Find'' and enter ''chain=A and (1-3,1160-1161,281-283)''. Click ''Isolate'' and check ''Atoms with Halos''. Zoom in. In the left center after "Halos around:" click ''Change'', and then ''Clear Halos''. Check ''Sequence numbers'' (near the bottom of the upper left panel).</ref> Tyr 1161 is peptide-bonded N-terminal to Met 1. Cys 2 is disulfide-bonded to Cys 282.</td></tr></table> | ||
</center> | </center> | ||
Other examples: | |||
*[http://firstglance.jmol.org/fg.htm?mol=1nsa 1nsa] ([[1nsa]]) is numbered 7A-95A ("A" being an insertion code) continuing 4-308. There is also 188A inserted between 188 and 189. | |||
*Chain R in [http://firstglance.jmol.org/fg.htm?mol=3sn6 3sn6] ([[3sn6]]). It is numbered 1002-1164 continuing 30-365. However the model lacks bonds between 1164 and 30 because amino acids 1161-1164 are missing due to crystallographic disorder. | |||
== Notes == | == Notes == | ||
<references/> | <references/> | ||
==See Also== | |||
*[[Renumbering PDB files]] | |||
*[[Missing residues and incomplete sidechains]] |