Protein Data Bank

Revision as of 04:52, 26 December 2009 by Eric Martz (talk | contribs) (→‎Remediation: polishing)

The World Wide Protein Data Bank (wwPDB) is the internationally recognized sole repository of all published, empirically-determined macromolecular three-dimensional (3D) structure data. Founded in 1971 by Drs. Edgar Meyer and Walter Hamilton Brookhaven National Laboratory, management of the Protein Data Bank was headed by Tom Koestle until 1994 and then by Joel L. Sussman till 1999, when it was transferred to members of the Research Collaboratory for Structural Bioinformatics (RCSB). Rutgers University is the lead site and is currently under the direction of Helen M. Berman. In 2008, it has three official branches: the Research Collaboratory for Structural Bioinformatics (USA), the European Bioinformatics Institute (UK), and the Protein Data Bank Japan.

New Releases CycleNew Releases Cycle

The wwPDB releases new entries once per week. These can be seen by clicking on the most recent release date, shown at the upper right of the main page at PDB.Org. In 2007, 7,280 new entries were released (an average of 140/week).

While the traditional entry consisted of an atomic coordinate file molecular model, more recently, the experimental data (structure factors in the case of crystallography) have been deposited along with the the model. After February 1, 2008, deposition of experimental data is required along with all new entries.

Many derivative databases copy, derive information from, or add value to the atomic coordinate files available from the wwPDB. Often, these automatically update their databases weekly, shortly after the new releases become available at the PDB. Proteopedia is one example.

PDB StatisticsPDB Statistics

At pdb.org, at the upper right corner of the main page, click on PDB Statistics for a wealth of interesting information, including proteins solved by multiple experimental methods, sequence redundancy in the PDB, the distribution of resolutions, the 100 journals that have published the most new macromolecular structures, and graphs of the growth of the database (under Content Growth).

Some interesting statistics (maxima, minima, means) for the contents of the PDB are summarized at Believe It or Not.

RemediationRemediation

Periodically, the PDB remediates its archived data files. Remediation improves consistency and nomenclature and removes some errors. Remediation involves changes in the PDB data format. Remediations occurred in August, 2007 and March, 2009. Details will be found at the World Wide PDB.

Here are some examples of changes that occurred in remediations affecting the PDB format. Prior to August, 2007, both DNA and RNA nucleotides were named A, C, G, T, and U. After August, 2007, DNA nucleotides were changed to DA, DC, DG, DT and DU, while RNA nucleotides continued to use the older one-letter names. (An example of a model that contains both DNA and RNA is 104d.) This change required changes in software packages such as Jmol, and left unmaintained packages such as Protein Explorer unable to deal properly with the remediated nucleic acids.

In the March, 2009 remediation, the order of chains and atoms changed in some PDB files in a non-systematic manner. This broke some scenes that had been saved in Proteopedia, and required redesign of some portions of Proteopedia (see Proteopedia avoids remediation-related problems).

Obsolete (unremediated) versions of the data files were saved and may be obtained: see Getting Unremediated PDB Files.

More About The Protein Data BankMore About The Protein Data Bank

See Also in ProteopediaSee Also in Proteopedia

External SourcesExternal Sources

Proteopedia Page Contributors and Editors (what is this?)Proteopedia Page Contributors and Editors (what is this?)

Eric Martz, Joel L. Sussman, Jaime Prilusky, Wayne Decatur