Quality assessment for molecular models: Difference between revisions
Eric Martz (talk | contribs) |
Eric Martz (talk | contribs) |
||
Line 1: | Line 1: | ||
==Crystallographic Models== | ==Crystallographic Models== | ||
About 85% of the molecular models published in the [[Protein Data Bank]] come from [[X-ray crystallography]] experiments. These crystallographic models vary widely in quality, and rarely they are grossly incorrect<ref>PMID: 17185570</ref><ref name="vtf">PMID: 22000512</ref> or [[Template:Murthy_fraud|fraudulent]]. Generally, model quality is indicated by the [[Resolution|resolution]] of the model, the [[R value]], and especially the [[Free R]]. Useful information on model quality, including the [[User:Karl_Oberholser/Ramachandran_Plots|Ramachandran plots]], can be obtained from PDBReports<ref>[http://swift.cmbi.ru.nl/gv/pdbreport/ PDBREPORT Database]</ref>. All-atom contact analysis<ref>Richardson, Jane S. (2003). All-atom contacts: a new approach to structure validation. [http://proteinexplorer.org/favlit.htm#aac Precis]. Chapter 15 in ''Structural Bioinformatics'' (2003) edited by Philip E. Bourne and Helge Weissig, Wiley-Liss, 649 pages. Complete contents at [http://www.structuralbioinformaticsbook.com structuralbioinformaticsbook.com].</ref> is a powerful newer method for finding and correcting errors in crystallographic models, made easy and convenient with the MolProbity Server<ref name="molprobity-server">[http://molprobity.biochem.duke.edu/ MolProbity Server]: All-atom contact analysis, flip corrections for Asn, Gln, His, clash analysis, Ramachandran analysis, and more.</ref>. | About 85% of the molecular models published in the [[Protein Data Bank]] come from [[X-ray crystallography]] experiments. These crystallographic models vary widely in quality, and rarely they are grossly incorrect<ref>PMID: 17185570</ref><ref name="vtf">PMID: 22000512</ref> or [[Template:Murthy_fraud|fraudulent]]. Generally, model quality is indicated by the [[Resolution|resolution]] of the model, the [[R value]], and especially the [[Free R]]. Useful information on model quality, including the [[User:Karl_Oberholser/Ramachandran_Plots|Ramachandran plots]], can be obtained from PDBReports<ref name="pdbreports">[http://swift.cmbi.ru.nl/gv/pdbreport/ PDBREPORT Database]</ref>. All-atom contact analysis<ref>Richardson, Jane S. (2003). All-atom contacts: a new approach to structure validation. [http://proteinexplorer.org/favlit.htm#aac Precis]. Chapter 15 in ''Structural Bioinformatics'' (2003) edited by Philip E. Bourne and Helge Weissig, Wiley-Liss, 649 pages. Complete contents at [http://www.structuralbioinformaticsbook.com structuralbioinformaticsbook.com].</ref> is a powerful newer method for finding and correcting errors in crystallographic models, made easy and convenient with the MolProbity Server<ref name="molprobity-server">[http://molprobity.biochem.duke.edu/ MolProbity Server]: All-atom contact analysis, flip corrections for Asn, Gln, His, clash analysis, Ramachandran analysis, and more.</ref>. | ||
Generally, crystallographic models are reliable in most details when they have [[Resolution|resolutions]] of 2.0 Å or better (the lower the number the better), [[R value|R values]] of 0.20 or less, and [[Free R]] values of 0.25 or less. However, new and important structural insights are often provided by models with much lower resolution. Interestingly, the quality of published molecular models is inversely related to the impacts of the journals in which they are published<ref>Brown EN, Ramaswamy S. 2007. Quality of protein crystal structures. [http://www.blackwell-synergy.com/doi/full/10.1107/S0907444907033847 Biol. Crystallography 63:941-950].</ref>. | Generally, crystallographic models are reliable in most details when they have [[Resolution|resolutions]] of 2.0 Å or better (the lower the number the better), [[R value|R values]] of 0.20 or less, and [[Free R]] values of 0.25 or less. However, new and important structural insights are often provided by models with much lower resolution. Interestingly, the quality of published molecular models is inversely related to the impacts of the journals in which they are published<ref>Brown EN, Ramaswamy S. 2007. Quality of protein crystal structures. [http://www.blackwell-synergy.com/doi/full/10.1107/S0907444907033847 Biol. Crystallography 63:941-950].</ref>. | ||
A thorough quality analysis could include<ref name="vtf" />: | |||
* Geometric and conformational validation criteria | |||
**Bond lengths, angles, and planes, available from | |||
In 2011, a Task Force of the worldwide [[Protein Data Bank]] recommended that state-of-the-art crystallographic validation tools be used to generate succinct reports, understandable to non-experts, at the time of a [[PDB code]] is assigned, and made available to the authors, reviewers, and users of the model<ref name="vtf" />. | In 2011, a Task Force of the worldwide [[Protein Data Bank]] recommended that state-of-the-art crystallographic validation tools be used to generate succinct reports, understandable to non-experts, at the time of a [[PDB code]] is assigned, and made available to the authors, reviewers, and users of the model<ref name="vtf" />. |
Revision as of 22:49, 18 October 2011
Crystallographic ModelsCrystallographic Models
About 85% of the molecular models published in the Protein Data Bank come from X-ray crystallography experiments. These crystallographic models vary widely in quality, and rarely they are grossly incorrect[1][2] or fraudulent. Generally, model quality is indicated by the resolution of the model, the R value, and especially the Free R. Useful information on model quality, including the Ramachandran plots, can be obtained from PDBReports[3]. All-atom contact analysis[4] is a powerful newer method for finding and correcting errors in crystallographic models, made easy and convenient with the MolProbity Server[5].
Generally, crystallographic models are reliable in most details when they have resolutions of 2.0 Å or better (the lower the number the better), R values of 0.20 or less, and Free R values of 0.25 or less. However, new and important structural insights are often provided by models with much lower resolution. Interestingly, the quality of published molecular models is inversely related to the impacts of the journals in which they are published[6].
A thorough quality analysis could include[2]:
- Geometric and conformational validation criteria
- Bond lengths, angles, and planes, available from
In 2011, a Task Force of the worldwide Protein Data Bank recommended that state-of-the-art crystallographic validation tools be used to generate succinct reports, understandable to non-experts, at the time of a PDB code is assigned, and made available to the authors, reviewers, and users of the model[2].
NMR ModelsNMR Models
Models resulting from solution NMR experiments account for about 15% of those published in the Protein Data Bank. These are generally less reliable than crystallographic models because the method yields less detailed information. For NMR, there are no widely reported global error estimates equivalent to the crystallographic R value and Free R. Unlike with crystallographic results, it is not possible to distinguish reliable from unreliable NMR models from information included in the PDB files. NMR models are more likely to contain major errors [7] than are crystallographic models that have good Resolution and Free R values.
Global vs. Local QualityGlobal vs. Local Quality
The indicators discussed above, notably resolution, R value, and free R, asses the average or global quality of the model. However, quality and uncertainty are not uniformly distributed throughout the model. Rather, there are regions of higher and lower uncertainty and quality. For crystallographic models, the easiest way to visualize local variations in uncertainty is to color the model by temperature value. As explained in the article on Temperature value, in a temperature-colored model, red atoms have the highest uncertainty in their positions in the model.
For models determined by NMR, disagreement among the ensemble of models in a particular region may signal higher uncertainty, due to local inadequacy of the distance restraints. However, it could also signal thermal motion -- please see NMR Ensembles of Models#Meaning of the Variation Between Models.
The MolProbity server[5] offers 3D visualization of atomic clashes, with indication of the severity of each clash. The presence of severe clashes indicates greater uncertainty in that local region of the model. MolProbity's analysis, termed all atom contact analysis, can be performed on NMR models (individual models in the ensemble, or the minimized average model) as well as on crystallographic models.
The orientation of the sidechains of Asn, Gln, and His cannot be determined from the electron density in a crystallographic experiment at typical resolution, because of the similarity in electron densities of carbon vs. nitrogen. It is usually straightforward to determine the correct orientation by examining the local environment and optimising hydrogen bonding. Unfortunately, is is common for these determinations not to be made in published crystallographic models. Fortunately, MolProbity does these determinations automatically, and corrects the model by flipping the sidechains of Asn, Gln and HIs when this is warranted.
Improving Published ModelsImproving Published Models
There are several free automated servers that can improve most published models. See Improving published models.
Further ReadingFurther Reading
Laskowski[8] has provided an outstandingly clear and succinct overview of how to assess model quality. See also the 2007 overview by Kleywegt[9] For examples of published crystallographic errors, see Laskowski, and Kleywegt, 2000[10], and Kleywegt and Brünger, 1996[11]. Kleywegt has also provided an excellent on-line tutorial on model validation[12].
See AlsoSee Also
- Resolution
- R value
- Free R
- Temperature value
- NMR Ensembles of Models
- Hydrogen in macromolecular models
- Improving published models
- Anisotropic refinement
Content DonorsContent Donors
Portions of this page were adapted from the Glossary of ProteinExplorer.Org, with the permission of the principal author, Eric Martz.
References and WebsitesReferences and Websites
- ↑ Miller G. Scientific publishing. A scientist's nightmare: software problem leads to five retractions. Science. 2006 Dec 22;314(5807):1856-7. PMID:17185570 doi:10.1126/science.314.5807.1856
- ↑ 2.0 2.1 2.2 Read RJ, Adams PD, Arendall WB 3rd, Brunger AT, Emsley P, Joosten RP, Kleywegt GJ, Krissinel EB, Lutteke T, Otwinowski Z, Perrakis A, Richardson JS, Sheffler WH, Smith JL, Tickle IJ, Vriend G, Zwart PH. A new generation of crystallographic validation tools for the protein data bank. Structure. 2011 Oct 12;19(10):1395-412. PMID:22000512 doi:10.1016/j.str.2011.08.006
- ↑ PDBREPORT Database
- ↑ Richardson, Jane S. (2003). All-atom contacts: a new approach to structure validation. Precis. Chapter 15 in Structural Bioinformatics (2003) edited by Philip E. Bourne and Helge Weissig, Wiley-Liss, 649 pages. Complete contents at structuralbioinformaticsbook.com.
- ↑ 5.0 5.1 MolProbity Server: All-atom contact analysis, flip corrections for Asn, Gln, His, clash analysis, Ramachandran analysis, and more.
- ↑ Brown EN, Ramaswamy S. 2007. Quality of protein crystal structures. Biol. Crystallography 63:941-950.
- ↑ Traditional biomolecular structure determination by NMR spectroscopy allows for major errors. Sander B. Nabuurs, Chris. A. E. M. Spronk, Geerten W. Vuister, and Gert Vriend. (2006). PLoS Computational Biology 2: Open Access Full Text Precis. DOI: 10.1371/journal.pcbi.0020009
- ↑ Laskowski, Roman A. 2003. Structural quality assurance. Chapter 14 in Structural Bioinformatics (2003) edited by Philip E. Bourne and Helge Weissig, Wiley-Liss, 649 pages. Complete contents at structuralbioinformaticsbook.com.
- ↑ Kleywegt, GJ. 2007. Quality control and validation. Methods Mol. Biol. 364:255-72. PubMed.
- ↑ Kleywegt, GJ. 2000. Validation of protein crystal structures. Acta. Crystallogr. D. Biol. Crystallogr. 56:249-265
- ↑ Kleywegt, GJ, AT Brünger. 1996. Checking your imagination: applications of the free R value. Structure 4:897-904. PubMed.
- ↑ Practical Model Validation by Gerard Kleywegt, University of Uppsala, Sweden