Protein Data Bank: Difference between revisions
Eric Martz (talk | contribs) →New Releases Cycle: polishing |
Eric Martz (talk | contribs) |
||
(28 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
The [http://www.wwpdb.org World Wide Protein Data Bank] (wwPDB) is the internationally recognized sole repository of all published, empirically-determined macromolecular three-dimensional (3D) structure data. Founded in 1971 by Drs. Edgar Meyer and Walter Hamilton [http://www.bnl.gov Brookhaven National Laboratory], management of the Protein Data Bank was headed by Tom Koestle until 1994 and then by [http://www.weizmann.ac.il/~joel Joel L. Sussman] till 1999, when it was transferred to members of the [http://home.rcsb.org/ Research Collaboratory for Structural Bioinformatics (RCSB)]. Rutgers University | The [http://www.wwpdb.org World Wide Protein Data Bank] (wwPDB)<ref>PMID: 14634627</ref><ref>PMID: 17142228</ref> is the internationally recognized sole repository<ref>PMID: 30357364</ref> of all published, empirically-determined atomic resolution macromolecular three-dimensional (3D) structure data. Founded in 1971 by Drs. Edgar Meyer and Walter Hamilton at [http://www.bnl.gov Brookhaven National Laboratory]<REF>PMID:875032</REF><REF>PMID:10089483</REF>, management of the Protein Data Bank was headed by Tom Koestle until 1994 and then by [http://www.weizmann.ac.il/~joel Joel L. Sussman] till 1999, when it was transferred to members of the [http://home.rcsb.org/ Research Collaboratory for Structural Bioinformatics (RCSB)]. RCSB is managed at Rutgers University and the San Diego Supercomputer Center. It was directed by [http://en.wikipedia.org/wiki/Helen_M._Berman Helen M. Berman] until July 2014, when Stephen K. Burley took over the directorship<ref>[http://www.rcsb.org/pdb/general_information/news_publications/newsletters/2014q4/home.html#one2 Leadership Transition], RCSB Newsletter, Fall 2014.</ref>. In 2008, the PDB has three official branches: the Research Collaboratory for Structural Bioinformatics (RCSB, USA), the European Bioinformatics Institute (PDBe, UK), and the Protein Data Bank Japan (PDBj, Osaka). | ||
==New Releases Cycle== | ==New Releases Cycle== | ||
The wwPDB releases new entries once per week. These can be seen by clicking on the most recent release date, shown at the upper right of the main page at [http://pdb.org PDB.Org]. In 2007, 7,280 new entries were released (an average of 140/week). | The wwPDB releases new entries once per week. These can be seen by clicking on the most recent release date, shown at the upper right of the main page at [http://pdb.org PDB.Org]. In 2007, 7,280 new entries were released (an average of 140/week). In 2011, 8,101 new entries were released (average 155/week).<ref>In May 2012, the following numbers were reported by advanced search on release dates at RCSB. 2011: 8,101. 2010: 7907. 2009: 7388. 2008: 6964. 2007: 7199.</ref> | ||
While the traditional entry consisted of an [[Atomic coordinate file | atomic coordinate file]] molecular model, more recently, the '''experimental data''' (structure factors in the case of crystallography) have been deposited along with the the model. After February 1, 2008, deposition of experimental data is required along with all new entries. | While the traditional entry consisted of an [[Atomic coordinate file | atomic coordinate file]] molecular model, more recently, the '''experimental data''' (structure factors in the case of crystallography) have been deposited along with the the model. After February 1, 2008, deposition of experimental data is required along with all new entries. | ||
Line 9: | Line 9: | ||
==PDB Statistics== | ==PDB Statistics== | ||
At [http://pdb.org pdb.org], at the upper right corner of the main page, click on ''PDB Statistics'' for a wealth of interesting information, including proteins solved by multiple experimental methods, sequence redundancy in the PDB, the distribution of resolutions, the 100 journals that have published the most new macromolecular structures, and graphs of the growth of the database (under ''Content Growth''). | At [http://pdb.org pdb.org], at the upper right corner of the main page, click on ''PDB Statistics'' for a wealth of interesting information, including proteins solved by multiple experimental methods, sequence redundancy in the PDB, the distribution of [[Resolution|resolutions]], the 100 journals that have published the most new macromolecular structures, and graphs of the growth of the database (under ''Content Growth''). | ||
Some interesting statistics (maxima, minima, means) for the contents of the PDB are summarized at [[Believe It or Not]]. | Some interesting statistics (maxima, minima, means) for the contents of the PDB are summarized at [[Believe It or Not]]. | ||
==Remediation== | |||
Periodically, the PDB remediates its archived data files. Remediation improves consistency and nomenclature and corrects some errors. Remediation involves changes in the [[Atomic_coordinate_file#PDB_Data_Format|PDB data format]]. Remediations occurred in August, 2007 and March, 2009. Details will be found at the [http://www.wwpdb.org/docs.html World Wide PDB]. | |||
Here are some examples of changes that occurred in remediations affecting the PDB format. | |||
* '''DNA:''' Prior to August, 2007, both DNA and RNA nucleotides were named A, C, G, T, and U. After August, 2007, DNA nucleotides were changed to DA, DC, DG, DT and DU, while RNA nucleotides continued to use the older one-letter names. (An example of a model that contains both DNA and RNA is [[104d]].) This change required changes in software packages such as [[Jmol]], and left unmaintained packages such as [[Protein Explorer]] unable to deal properly with the remediated nucleic acids. | |||
* '''Non-standard residues''': Some PDB files represented non-standard residues as a standard residue (ATOM records) plus an adduct (HETATM records). Some of these were changed to a uniform name for a non-standard residue, so that all atoms in the same residue have the same name (and all are HETATM records). For example, phosphoserine in [[1apm]] was SER plus PHO; phosphothreonine THR plus PHO. These were remediated to SEP and TPO. In another example, methylated ribonucleotides in [[310d]] had been named e.g. +C1 plus CH3. These were remediated to OMC and so forth. | |||
* '''Order of atoms:''' In the March, 2009 remediation, the order of chains and atoms changed in some PDB files in a non-systematic manner. This broke some scenes that had been saved in Proteopedia, and required redesign of some portions of Proteopedia (see [[Getting_Unremediated_PDB_Files#Proteopedia avoids remediation-related problems|Proteopedia avoids remediation-related problems]]). | |||
Obsolete (unremediated) versions of the data files were saved by the PDB before each remediation, and may be obtained: see [[Getting Unremediated PDB Files]]. | |||
==Sequence Numbering Anomalies== | |||
Entries in the PDB often contain anomalies in sequence numbering (see [[Homology_modeling_servers#Sequence_Numbering_Anomalies]]). | |||
==Improving Published Models== | |||
There are several free automated servers that can improve most published models. See [[Improving published models]] and [[Quality assessment for molecular models]]. | |||
==More About The Protein Data Bank== | ==More About The Protein Data Bank== | ||
Line 17: | Line 35: | ||
===See Also in Proteopedia=== | ===See Also in Proteopedia=== | ||
*[[AlphaFold#AlphaFold_Database_of_Predictions|Database of structures predicted by AlphaFold2]]. | |||
*[[ModelArchive]] for [[Empirical models|non-empirical]] models. | |||
*[[About Macromolecular Structure]], a list of pages in Proteopedia | *[[About Macromolecular Structure]], a list of pages in Proteopedia | ||
*[[Atomic coordinate file]] | *[[Atomic coordinate file]] | ||
Line 23: | Line 43: | ||
*[[PDB identification code]] | *[[PDB identification code]] | ||
*[[Highest impact structures]] of all time | *[[Highest impact structures]] of all time | ||
*[[Improving published models]] | |||
*[[Quality assessment for molecular models]] | *[[Quality assessment for molecular models]] | ||
Line 30: | Line 51: | ||
*[http://www.pdb.org RCSB PDB] | *[http://www.pdb.org RCSB PDB] | ||
*[http://en.wikipedia.org/wiki/Protein_data_bank Protein Data Bank in Wikipedia] | *[http://en.wikipedia.org/wiki/Protein_data_bank Protein Data Bank in Wikipedia] | ||
* | *"Synergies between the Protein Data Bank and the community", 2021<ref name="synergies">PMID: 33963295</ref>. | ||
*Berman H, Henrick K, Nakamura H, Markley JL. The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res.35:D301-3. (2007) PMID:[http://www.ncbi.nlm.nih.gov/pubmed/17142228 17142228]. | *Berman H, Henrick K, Nakamura H, Markley JL. The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res.35:D301-3. (2007) PMID:[http://www.ncbi.nlm.nih.gov/pubmed/17142228 17142228]. | ||
*Berman HM <i>et al.</i>, The Protein Data Bank, Acta Crystallogr D Biol Crystallogr.58:899-907 (2002). PMID:[http://www.ncbi.nlm.nih.gov/pubmed/12037327 12037327] | *Berman HM <i>et al.</i>, The Protein Data Bank, Acta Crystallogr D Biol Crystallogr.58:899-907 (2002). PMID:[http://www.ncbi.nlm.nih.gov/pubmed/12037327 12037327] | ||
Line 36: | Line 57: | ||
*Sussman JL, Lin D, Jiang J, Manning NO, Prilusky J, Ritter O, Abola EE (1998). "Protein data bank (PDB): a database of 3D structural information of biological macromolecules". ''Acta Cryst'' '''D54''':1078-1084. PMID 10089483. | *Sussman JL, Lin D, Jiang J, Manning NO, Prilusky J, Ritter O, Abola EE (1998). "Protein data bank (PDB): a database of 3D structural information of biological macromolecules". ''Acta Cryst'' '''D54''':1078-1084. PMID 10089483. | ||
*[http://www.umass.edu/microbio/rasmol/1st_xtls.htm Earliest Solutions for Macromolecular Crystal Structures] | *[http://www.umass.edu/microbio/rasmol/1st_xtls.htm Earliest Solutions for Macromolecular Crystal Structures] | ||
*[https://www.rcsb.org/news/feature/66acd3c8eb1f4889a9e4432b 'PDB Archive Serves Structures Determined by Integrative and Hybrid Methods (IHM) (August 2024)'] | |||
*See also [[Proteopedia:Policy#Theoretical_Models | Theoretical Models]]. | *See also [[Proteopedia:Policy#Theoretical_Models | Theoretical Models]]. | ||
==References and Notes== | |||
<references /> |