NMR Ensembles of Models: Difference between revisions

Eric Martz (talk | contribs)
Eric Martz (talk | contribs)
 
(32 intermediate revisions by 5 users not shown)
Line 1: Line 1:
==Caveat==
==Structure Determination by NMR==


I wrote the initial content for this page based on discussions I've had with NMR spectroscopists and crystallographers. Be warned that I am far from an expert, and have never done either NMR spectroscopy nor crystallography. Until this page is vetted by an expert, its content should be considered provisional, not authoritative! [[User:Eric Martz|Eric Martz]] 01:25, 26 June 2008 (IDT)
About 6% of the entries in the [[Protein Data Bank]] were determined by nuclear magnetic resonance in solution (NMR) as of October 2024. 83% were determined by [[X-ray crystallography]], 10% by [[electron cryomicroscopy]], and <1% by other methods. NMR can only be used for relatively small macromolecules (see [[#Median_Size_of_Published_NMR_Structures|below]]).


==Structure Determination by NMR==
<blockquote>
NMR spectroscopy is based on the ability of a nucleus with a spin of 1/2 (e.g. <sup>1</sup>H, <sup>13</sup>C, <sup>15</sup>N, <sup>31</sup>P) to adopt two different orientations in a magnetic field. The distribution of nuclei between the two states can be changed by subjecting them to a short pulse of radiation with a frequency commensurate with the energy difference between them. Monitoring the magnetic signals in the subsequent decay can yield dynamic information about the orientation and spacing of the nuclei, which provide restraints that can be turned into structural information<ref>Quoted from page 22 of the book ''Molecular Biology of Assemblies and Machines'' by Steven, Baumeister, Johnson and Perham, [https://www.crcpress.com/Molecular-Biology-of-Assemblies-and-Machines/Steven-Baumeister-Johnson-Perham/9780815341666 Garland/CRC Press, 2016].</ref>.
</blockquote>


About 14% of the entries in the [[Protein Data Bank]] were determined by nuclear magnetic resonance in solution (NMR) as of mid-2008. 85% were determined by X-ray crystallography, and <1% by other methods. NMR can only be used for relatively small macromolecules (see [[#Median_Size_of_Published_NMR_Structures|below]]).
The primary data yielded by NMR analysis is mostly local and more recently global geometric information about atoms within the structure. Typically, these include distance between pairs of atoms, dihedral angles (typically backbone φ angles and some side-chain χ1 angles) and sometimes global information such as the orientation of a given bond with respect to a fixed axis of the molecule. These data are used as "restraints" to reconstruct 3D models which are compatible with the NMR data. All calculations are performed directly in the physical space, starting with a random conformation of the macromolecule, which is progressively folded to satisfy the restraints. Typically, several runs are performed, starting from different initial conformations, in order to check that the calculation converges onto a single solution. The result is thus an ensemble of models, the distribution of which gives a measure of the precision of the NMR structure.


NMR measurements yield a set of interatomic distances, called "restraints". Model building for NMR experiments typically starts with the complete protein or nucleic acid chain, including [[Hydrogen atoms]]. The distance restraints are then applied. The resulting model usually includes the entire protein and nucleic acid chains, unlike [[X-ray crystallography|X-ray crystallographic]] models that often lack the ends, and even loops in the middle of chains, due to  [[Disorder in protein crystals]].
Model building for NMR experiments typically starts with the complete protein or nucleic acid chain, including [[Hydrogen in macromolecular models|hydrogen atoms]]. The distance restraints are then applied. The resulting model usually includes the entire protein and nucleic acid chains, unlike [[X-ray crystallography|X-ray crystallographic]] models that often lack the ends, and even loops in the middle of chains, due to  [[Disorder|disorder]] in protein crystals.


Macromolecular structure determination by NMR is done in aqueous solution, and thus requires that the molecule be soluble. For more information, see ''Nature of 3D Structural Data''<ref>[http://www.pdb.org/pdb/static.do?p=general_information/about_pdb/nature_of_3d_structural_data.html Nature of 3D Structural Data]</ref> and ''NMR in Wikipedia''<ref>[http://en.wikipedia.org/wiki/Nuclear_magnetic_resonance NMR in Wikipedia]</ref>.
Macromolecular structure determination by NMR is done at high protein concentrations in aqueous solution, and thus requires that the molecule be highly soluble. For more information, see ''NMR in Wikipedia''<ref>[http://en.wikipedia.org/wiki/Nuclear_magnetic_resonance NMR in Wikipedia]</ref>
and ''Nature of 3D Structural Data at PDB''<ref>[https://web.archive.org/web/20110415074813/http://www.pdb.org/pdb/static.do?p=general_information/about_pdb/nature_of_3d_structural_data.html Nature of 3D Structural Data (archived copy)]</ref>.


==Displaying NMR Models==
==Displaying NMR Models==
Line 15: Line 18:
===Display of NMR Models by Proteopedia===
===Display of NMR Models by Proteopedia===
{{STRUCTURE_1lcd|  PDB=1lcd  |  SCENE=  }}
{{STRUCTURE_1lcd|  PDB=1lcd  |  SCENE=  }}
Proteopedia shows '''all the models''' in ensembles of models from NMR experiments. This enables you to see where the models agree with each other, and where they differ. Each model is shown as a thin backbone trace (a line connecting alpha carbon atoms of amino acids, or phosphorus atoms in DNA or RNA chains). The backbone traces are colored by <font color="blue">'''Amino'''</font> to  <font color="red">'''Carboxy'''</font> "rainbow", a spectral sequence of colors starting at the amino terminus (or 5' terminus of nucleic acid chains) and ending at the carboxy terminus (or 3' terminus).
For ensembles of models from NMR experiments, Proteopedia initially displays just the first model, in the usual cartoon rendering; this is done so to speed-up page loading. You will see a "Displaying simplified model" message within the JSmol panel. If you then click the "load full" button (in orange color), Proteopedia will show '''all the models''', enabling you to see where the models agree with each other, and where they differ. Each model is shown as a thin backbone trace (a line connecting alpha carbon atoms of amino acids, or phosphorus atoms in DNA or RNA chains). The backbone traces are colored by <font color="blue">'''Amino'''</font> to  <font color="red">'''Carboxy'''</font> "rainbow", a spectral sequence of colors starting at the amino terminus (or 5' terminus of nucleic acid chains) and ending at the carboxy terminus (or 3' terminus).


{{ColorKey N2CRainbow}}
{{ColorKey N2CRainbow}}
Line 21: Line 24:
'''Ligands''' ([[Hetero atoms]]) are also shown for all models, except that they are opaque only for model 1, and translucent for all other models. Ligand atoms are colored by element, using the [[CPK color scheme]]. Examples with hetero groups covalently linked to chain termini, with extremely variable positions, are [[1jsa]] and [[1dqc]]. [[1bah]] also has hetero groups in variable positions. [[1hpn]] has only hetero atoms.
'''Ligands''' ([[Hetero atoms]]) are also shown for all models, except that they are opaque only for model 1, and translucent for all other models. Ligand atoms are colored by element, using the [[CPK color scheme]]. Examples with hetero groups covalently linked to chain termini, with extremely variable positions, are [[1jsa]] and [[1dqc]]. [[1bah]] also has hetero groups in variable positions. [[1hpn]] has only hetero atoms.


The example at right shows the 3 models for [[1lcd]], a lac repressor domain bound to DNA, with one sodium ion. '''Water''' is present in this model, but for clarity, Proteopedia does not show water in its initial scene. <scene name='NMR_Ensembles_of_Models/Water/1'>Show water</scene>. (To hide water, click the ''initial scene'' green link just below the molecule.)
The example at right, after clicking the "load full" button, shows the 3 models for [[1lcd]], a lac repressor domain bound to DNA, with one sodium ion. '''Water''' is present in this model, but for clarity, Proteopedia does not show water in its initial scene. <scene name='NMR_Ensembles_of_Models/Water/1'>Show water</scene>. (To hide water, click the ''initial scene'' green link just below the molecule.)


'''Disulfide bonds''' are shown as yellow rods connecting backbones, with the first model opaque, and all other models translucent. An example is [[1iw4]].
'''Disulfide bonds''' are shown as yellow rods connecting backbones, with the first model opaque, and all other models translucent. An example is [[1iw4]].
Line 27: Line 30:
===Individual Models===
===Individual Models===


In order to view individual models, click on ''Jmol'' (lower right corner below the molecule) to '''open Jmol's menu'''. There, use the '''All N models''' item (where N is the total number of models in the ensemble). For example, clicking on 1.1: 1 will display only model 1, and the menu will now say ''model 1/N''. You can also use Jmol's menu to change the rendering and coloring.
Proteopedia shows only the first model by default, while it says ''Displaying simplified model''. After you click the <font color="orange">orange '''load full'''</font> button, all models will be displayed.
 
In order to view individual models, click on ''JSmol'' or ''Jmol_S'' (lower right corner below the molecule) to '''open Jmol's menu'''. There, use the '''All N models''' item (where N is the total number of models in the ensemble). For example, clicking on 1.1: 1 will display only model 1, and the menu will now say ''model 1/N''. You can also use Jmol's menu to change the rendering and coloring.
 
[[FirstGlance in Jmol]] also shows model 1 by default, but you can click on ''View All Models''.


===Animating NMR Ensembles===
===Animating NMR Ensembles===


When the models in an NMR ensemble are played like a movie, the resulting animation simulates thermal motion (although not all the motions is necessarily real -- see [[#Meaning_of_the_Variation_Between_Models|below]]). In order to animate the models, click on ''Jmol'' (lower right corner below the molecule) to '''open Jmol's menu'''. Choose '''Animation''', then '''Animation mode''', and click on '''Loop'''. Then choose ''Animation'' again, and click '''Play'''. You can change the speed of the animation with '''FPS''' (frames per second) on the ''Animation'' menu.  By default, there is a delay at the first and last models.
When the models in an NMR ensemble are played like a movie, the resulting animation simulates thermal motion (although not all the motions are necessarily real -- see [[#Meaning_of_the_Variation_Between_Models|below]]). In order to animate the models, click on ''JSmol'' or ''Jmol_S'' (lower right corner below the molecule) to '''open Jmol's menu'''. Choose '''Animation''', then '''Animation mode''', and click on '''Loop'''. Then choose ''Animation'' again, and click '''Play'''. You can change the speed of the animation with '''FPS''' (frames per second) on the ''Animation'' menu.  By default, there is a delay at the first and last models.


==Multiple Model Ensembles from NMR==
==Multiple Model Ensembles from NMR==
Line 38: Line 45:
When a macromolecular structure is determined by nuclear magnetic resonance (NMR) in solution, the result is an '''ensemble of multiple molecular models''', each of which is consistent with the experimental data. The results of an NMR experiment are a large number of inter-atomic distance restraints, which are consistent with multiple models. This is in contrast to the result of an X-ray crystallographic experiment, which is a single model that best fits the empirical electron density. (In some cases where the resolution is very high, the model may include alternative positions for some atoms.)
When a macromolecular structure is determined by nuclear magnetic resonance (NMR) in solution, the result is an '''ensemble of multiple molecular models''', each of which is consistent with the experimental data. The results of an NMR experiment are a large number of inter-atomic distance restraints, which are consistent with multiple models. This is in contrast to the result of an X-ray crystallographic experiment, which is a single model that best fits the empirical electron density. (In some cases where the resolution is very high, the model may include alternative positions for some atoms.)


The number of NMR models published depends upon the experiment and is up to the authors, and varies between 2 (e.g. [[1cvo]]) and over 100. The first model in the ensemble has no special significance (see [[#The Most Representative Model|the most representative model]]).
The number of NMR models published depends upon the experiment and is up to the authors, and varies between 2 (e.g. [[1cvo]]) and over 100. '''The median number of models is 20.''' (You can search for entries with a specified number or range of models using [[User:OCA|OCA]]). The first model in the ensemble has no special significance (see [[#The Most Representative Model|the most representative model]]).


===Meaning of the Variation Between Models===
===Meaning of the Variation Between Models===


The '''variation between models''' in the ensemble can mean either of two things. The variation can represent actual '''flexibility and thermal motion''' that occurred during the NMR measurements in solution, typically at room temperature. Alternatively, the variation can simply mean '''uncertainty in the atomic positions''', namely, that an inadequate number of restraints were available to determine the positions of some atoms. Unfortunately, there is nothing comparable to the [[B value]] or [[Temperature value]] that quantitates the uncertainty of the position of each atom in crystallographic results. Hence, the only way to find out what the meaning of the variation between models is to contact the experimenters who authored the published ensemble of models.
The '''variation between models''' in the ensemble can mean either of two things. The variation can represent actual '''flexibility and thermal motion''' that occurred during the NMR measurements in solution, typically at room temperature. Alternatively, the variation can simply mean '''uncertainty in the atomic positions''', namely, that an inadequate number of restraints were available to determine the positions of some atoms. Unfortunately, there is nothing comparable to the [[B value]] or [[Temperature value]] that quantitates the uncertainty of the position of each atom in crystallographic results. Specific NMR relaxation experiments can however be used to measure the dynamics of individual atoms, mainly backbone amide groups, as the relaxation of the NMR signal is indeed dependent on the internal motions of the molecule. When these NMR relaxation data are available, they can be used to determine '''order parameters''', which are strongly correlated with the [[B value]]s of the crystallographic structures. These can be used to distinguish between intrinsic flexibility and uncertainty due to lack of constraints. When relaxation data is not available, the only way to find out what the meaning of the variation between models is to contact the experimenters who authored the published ensemble of models.
 
Protein chains commonly have more variation between models at the ends than in the middle. An example is [[2yru]].


Using appropriate methodologies, it is possible to determine both the average structure and its dynamic movements<ref>Simultaneous determination of protein structure and dynamics. Kresten Lindorff-Larsen, Robert B. Best, Mark A. DePristo, Christopher M. Dobson, and Michele Vendruscolo (2005). Nature 433:128. PMID:[http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=15650731 15650731].</ref>.
Using appropriate methodologies, it is possible to determine both the average structure and its dynamic movements<ref>Simultaneous determination of protein structure and dynamics. Kresten Lindorff-Larsen, Robert B. Best, Mark A. DePristo, Christopher M. Dobson, and Michele Vendruscolo (2005). Nature 433:128. PMID:[http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=15650731 15650731].</ref>.
Line 54: Line 63:
==Reliability of NMR Models==
==Reliability of NMR Models==


NMR models are more likely to contain major errors <ref>Traditional biomolecular structure determination by NMR spectroscopy allows for major errors. Sander B. Nabuurs, Chris. A. E. M. Spronk, Geerten W. Vuister, and Gert Vriend. (2006). PLoS Computational Biology 2: [http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.0020009 Open Access Full Text] [http://proteinexplorer.org/favlit/nmr.htm Precis]. DOI: 10.1371/journal.pcbi.0020009</ref> than are crystallographic models that have good [[Resolution]] and [[Free R]] values.
NMR models are more likely to contain major errors <ref>Traditional biomolecular structure determination by NMR spectroscopy allows for major errors. Sander B. Nabuurs, Chris. A. E. M. Spronk, Geerten W. Vuister, and Gert Vriend. (2006). PLoS Computational Biology 2: [http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.0020009 Open Access Full Text] [http://proteinexplorer.org/favlit/nmr.htm Precis]. DOI: 10.1371/journal.pcbi.0020009</ref> than are crystallographic models that have good [[Resolution]] and [[Free R]] values. See also [[Quality assessment for molecular models]]. In 2012, an X-ray crystallographic structure of integral membrane diacylglycerol kinase, [[3ze4]], revealed functionally important domain swapping<ref>PMID: 23676672</ref><ref>PMID: 23676677</ref> that was not present in an earlier NMR structure [[2kdc]]<ref>PMID: 19556511</ref>.


==Median Size of Published NMR Structures==
==Median Size of Published NMR Structures==
Solution NMR is unable to determine atomic resolution protein structures for molecules in excess of about 30,000 Daltons. In fact, the median mass of NMR structures published in the [[Protein Data Bank]] is about 9 kD, with 90% less than 19 kD <ref>At Protein Explorer: [http://proteinexplorer.org/gpsi/chardata.htm Size and Redundancy of 43,000 Entries  
Solution NMR is unable to determine atomic resolution protein structures for molecules in excess of about 30,000 Daltons. In fact, the median mass of NMR structures published in the [[Protein Data Bank]] is about 9 kD, with 90% less than 19 kD <ref>At Protein Explorer: [http://proteinexplorer.org/gpsi/chardata.htm Size and Redundancy of 43,000 Entries in the Protein Data Bank] (as of April 2007).</ref>. In contrast, the median mass of crystallographically determined structures is 45 kD, with 90% <145 kD.
in the Protein Data Bank] (as of April 2007)</ref>. In contrast, the median mass of crystallographically determined structures is 45 kD, with 90% <145 kD.


==Alignment of Models==
==Alignment of Models==
NMR models are typically structurally aligned by the authors before publication. However, there are some exceptions, such as [[1qp6]], [[1dl0]], and [[1i25]], in which the individual models are not aligned. In such cases, one needs to look at [[#Individual_Models|individual models]] in order to understand the molecular structure.
NMR models are typically [[structural alignment tools|structurally aligned]] by the authors before publication. However, there are some exceptions, such as [[1qp6]], [[1dl0]], and [[1i25]], in which the individual models are not aligned. In such cases, one needs to look at [[#Individual_Models|individual models]] in order to understand the molecular structure.


The alignment can affect your perception of the variation between models. For example, calmodulin contains two [[EF-hands]] connected by a flexible linker. When calmodulin is not bound to a cognate peptide, the two EF-hands can move relative to each other, flexing the linker. In [[1cfc]], the N-terminal EF-hands are aligned, but the C-terminal EF-hands are in different orientations. Alternatively, had the C-terminal EF-hands been aligned, then the N-terminal EF-hands would be in variable orientations. And less plausibly, had a short center segment of the flexible linker been aligned, both ends would be in variable orientations.
The alignment can affect your perception of the variation between models. For example, calmodulin contains two [[EF-hands]] connected by a flexible linker. When calmodulin is not bound to a cognate peptide, the two EF-hands can move relative to each other, flexing the linker. In [[1cfc]], the N-terminal EF-hands are aligned, but the C-terminal EF-hands are in different orientations. Alternatively, had the C-terminal EF-hands been aligned, then the N-terminal EF-hands would be in variable orientations. And less plausibly, had a short center segment of the flexible linker been aligned, both ends would be in variable orientations.
Another example of two folded domains (zinc fingers) connected by a flexible linker is [[1zu1]]. Again, only one domain can be aligned, and which one is arbitrary.
==See Also==
*[[X-ray crystallography]]
*[[Empirical models]]


==References and Websites==
==References and Websites==


<references />
<references />
==External Resources==
*[http://www.pdb.org/pdb/static.do?p=education_discussion/Looking-at-Structures/methods.html Methods for Determining Atomic Structures discussed at the Protein Data Bank]

Proteopedia Page Contributors and Editors (what is this?)Proteopedia Page Contributors and Editors (what is this?)

Eric Martz, Frédéric Dardel, Wayne Decatur, Angel Herraez, Amr A. M. Alhossary