PDB identification code: Difference between revisions
Eric Martz (talk | contribs) No edit summary |
|||
Line 1: | Line 1: | ||
Every molecular model ([[Atomic coordinate file|atomic coordinate file]]) in the [[Protein Data Bank]] (PDB) has a unique accession or identification code. These codes are always 4 characters in length. The first character is a numeral, while the last three characters can be either numerals or letters. In the past, the first character was always a numeral in the range 1-9. Although there appear to be no entries beginning with "0", its exclusion [http://www.rcsb.org/robohelp/index.htm#search_database/pdb_identifier.htm may have been relaxed]. | Every molecular model ([[Atomic coordinate file|atomic coordinate file]]) in the [[Protein Data Bank]] (PDB) has a unique accession or identification code. These codes are always 4 characters in length. The first character is a numeral, while the last three characters can be either numerals or letters. In the past, the first character was always a numeral in the range 1-9. Although there appear to be no entries beginning with "0", its exclusion [http://www.rcsb.org/robohelp/index.htm#search_database/pdb_identifier.htm may have been relaxed]. | ||
==Lower vs. Upper Case== | |||
Examples | PDB codes are often written in upper case. However, to avoid confusing zero (0) with the letter "O", lower case is helpful, for example [[1o1o]] is clearer than 1O1O, and [[2ou0]] is clearer than 2OU0. (Also links to upper case codes in Proteopedia don't work! For example, [[1O1O]].) Depending on the font, number 1 can also be confused with capital "I" or lower case "L". So [[1imo]] is clearer than 1IMO, but 1X9L is clearer than [[1x9l]]. | ||
==PDB codes in Proteopedia== | |||
Every released entry in the PDB has an automatically-generated page in Proteopedia. To find it, simply enter the PDB code in the search slot found at the left of this (and every) page in Proteopedia. Proteopedia is updated once each week, shortly following the weekly new release cycle at the PDB. To link to a PDB code-titled page in Proteopedia, in the wikitext box, use double square brackets around the code. So for example, typing <nowiki>[[1vot]]</nowiki> when editing a Proteopedia article generates the link [[1vot]]. | |||
==Examples of PDB Codes== | |||
* [[1mbn]] - a 1973 model of myoglobin, the [http://www.umass.edu/microbio/rasmol/1st_xtls.htm first protein structure solved]. | * [[1mbn]] - a 1973 model of myoglobin, the [http://www.umass.edu/microbio/rasmol/1st_xtls.htm first protein structure solved]. | ||
Line 10: | Line 16: | ||
* [[2hhd]] - human hemoglobin, deoxy. | * [[2hhd]] - human hemoglobin, deoxy. | ||
* [[9ins]] - insulin. | * [[9ins]] - insulin. | ||
==Newer PDB Codes are Random== | |||
For many years, depositors of models could request an available PDB code that represented an acronym for the molecule represented. All the above examples are such cases. With the increase in number of new entries each week, the PDB no longer permits this option. In recent years, all PDB codes are assigned by the PDB from the pool of available codes, without reference to the name of the molecule. | For many years, depositors of models could request an available PDB code that represented an acronym for the molecule represented. All the above examples are such cases. With the increase in number of new entries each week, the PDB no longer permits this option. In recent years, all PDB codes are assigned by the PDB from the pool of available codes, without reference to the name of the molecule. | ||
==Limited Number of PDB Codes== | |||
There are over 400,000 possible 4-character PDB identification codes (419,904 or 466,560 if "0" is allowed as the first character). Thus, the ~78,000 entries in early 2012 have used up less than 19% of the available codes. Someday a scheme that can accomodate more entries will be required, requiring revision of macromolecular visualization and modeling software programs that obtain data online, all of which, of necessity, currently require 4-character PDB codes. | There are over 400,000 possible 4-character PDB identification codes (419,904 or 466,560 if "0" is allowed as the first character). Thus, the ~78,000 entries in early 2012 have used up less than 19% of the available codes. Someday a scheme that can accomodate more entries will be required, requiring revision of macromolecular visualization and modeling software programs that obtain data online, all of which, of necessity, currently require 4-character PDB codes. |
Revision as of 03:49, 25 March 2013
Every molecular model (atomic coordinate file) in the Protein Data Bank (PDB) has a unique accession or identification code. These codes are always 4 characters in length. The first character is a numeral, while the last three characters can be either numerals or letters. In the past, the first character was always a numeral in the range 1-9. Although there appear to be no entries beginning with "0", its exclusion may have been relaxed.
Lower vs. Upper CaseLower vs. Upper Case
PDB codes are often written in upper case. However, to avoid confusing zero (0) with the letter "O", lower case is helpful, for example 1o1o is clearer than 1O1O, and 2ou0 is clearer than 2OU0. (Also links to upper case codes in Proteopedia don't work! For example, 1O1O.) Depending on the font, number 1 can also be confused with capital "I" or lower case "L". So 1imo is clearer than 1IMO, but 1X9L is clearer than 1x9l.
PDB codes in ProteopediaPDB codes in Proteopedia
Every released entry in the PDB has an automatically-generated page in Proteopedia. To find it, simply enter the PDB code in the search slot found at the left of this (and every) page in Proteopedia. Proteopedia is updated once each week, shortly following the weekly new release cycle at the PDB. To link to a PDB code-titled page in Proteopedia, in the wikitext box, use double square brackets around the code. So for example, typing [[1vot]] when editing a Proteopedia article generates the link 1vot.
Examples of PDB CodesExamples of PDB Codes
- 1mbn - a 1973 model of myoglobin, the first protein structure solved.
- 1tna - a 1975 model of yeast phenylalanine transfer RNA, the first RNA structure solved.
- 1bna - the first full turn of a B-form DNA double helix solved by crystallography. Solved in 1980, this confirmed, 27 years later, the 1953 theoretical model of Watson & Crick. In the intervening years, methods were developed for macromolecular crystallography, and for producing short segments of DNA of defined sequences. More...
- 2hhd - human hemoglobin, deoxy.
- 9ins - insulin.
Newer PDB Codes are RandomNewer PDB Codes are Random
For many years, depositors of models could request an available PDB code that represented an acronym for the molecule represented. All the above examples are such cases. With the increase in number of new entries each week, the PDB no longer permits this option. In recent years, all PDB codes are assigned by the PDB from the pool of available codes, without reference to the name of the molecule.
Limited Number of PDB CodesLimited Number of PDB Codes
There are over 400,000 possible 4-character PDB identification codes (419,904 or 466,560 if "0" is allowed as the first character). Thus, the ~78,000 entries in early 2012 have used up less than 19% of the available codes. Someday a scheme that can accomodate more entries will be required, requiring revision of macromolecular visualization and modeling software programs that obtain data online, all of which, of necessity, currently require 4-character PDB codes.