Jmol/Visualizing large molecules: Difference between revisions
Eric Martz (talk | contribs) |
Eric Martz (talk | contribs) No edit summary |
||
(34 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
<Structure size='400' frame='true' align='right' caption='Half-capsid of human hepatitis B virus displaying only the alpha carbon atoms for the [[biological assembly]] of [[2g33]].' scene='Jmol/Visualizing_large_molecules/Backbone_by_chain/1' /> | <Structure size='400' frame='true' align='right' caption='Half-capsid of human hepatitis B virus displaying only the alpha carbon atoms for the [[biological assembly]] of [[2g33]].' scene='Jmol/Visualizing_large_molecules/Backbone_by_chain/1' /> | ||
<center><table style="background-color:#d8ffd8;" class="wikitable"><tr><td> | |||
This page was written in 2011 and '''needs major revisions''' to take into account (i) that Proteopedia now automatically shows [[biological unit]] 1, simplified as needed (try [[4v60]] or [[1sva]]<ref>Page [[1sva]] uses the [[Molstar]] viewer instead of [[JSmol]].</ref>); (ii) the 2022 ability of [http://firstglance.jmol.org FirstGlance in Jmol] to automatically simplify and display very large [[biological units]]; | |||
and (iii) the ability of JSmol to generate biological units. Please see: | |||
*[[Biological_Unit#Visualizing_the_Biological_Unit|Visualizing the Biological Unit]] | |||
*[[FirstGlance/Virus_Capsids_and_Other_Large_Assemblies]] | |||
*[[Biological_Unit:_Showing]] | |||
[[User:Eric Martz|Eric Martz]] 21:27, 22 September 2024 (UTC) | |||
</td></tr></table></center> | |||
==Inadequate Memory May Preclude Display== | ==Inadequate Memory May Preclude Display== | ||
Some molecular models ("molecules") are so large that they will not fit within the default amount of computer memory allocated to Jmol (which is the default amount of memory allocated to java). While it is possible to [http://wiki.jmol.org/index.php/Jmol_Applet#Giving_JmolApplet_more_memory_to_work_with increase the memory allocated to java], most users will not do this, and hence, will not be able to display, in Proteopedia or Jmol, molecules that exceed a certain size. | Some molecular models ("molecules") are so large that they will not fit within the default amount of computer memory allocated to Jmol (which is the default amount of memory allocated to java). While it is possible to [http://wiki.jmol.org/index.php/Jmol_Applet#Giving_JmolApplet_more_memory_to_work_with increase the memory allocated to java], most users will not do this, and hence, will not be able to display, in Proteopedia or Jmol, molecules that exceed a certain size. | ||
===Solutions=== | |||
Below are explained various strategies for reducing the sizes of large [[PDB files]], enabling their main features to be displayed in the default Jmol/java memory. These strategies include displaying only the backbones (alpha carbons for proteins and phosphorus atoms for nucleic acids), and displaying one, or a subset, of the models in multiple-model files. These "reduced" files can be uploaded for use in molecular scenes in Proteopedia. An example is shown in the Jmol at the upper right corner of this article. | |||
==Maximum Size Per Model== | ==Maximum Size Per Model== | ||
Line 18: | Line 29: | ||
Rat Liver Vault, alpha carbon atoms only. | Rat Liver Vault, alpha carbon atoms only. | ||
</center></td></tr></table> | </center></td></tr></table> | ||
====241,956 Atoms: Rat Liver Vault==== | ====Example with 241,956 Atoms: Rat Liver Vault==== | ||
The rat liver vault needed to be split into 3 PDB files: [[2zuo]], [[2zv4]], and [[2zv5]]. Each file contains 80,652 atoms in 13 chains, for a total in the asymmetric unit of 241,956 atoms in 39 chains (A-Z, a-m). The biological unit contains 2 asymmetric units. Fortunately, the authors provide [http://www.protein.osaka-u.ac.jp/olabb/tsukihara/mvp/index.html PDB files containing complete asymmetric units]. However, these are 18 megabyte files, and do not fit in Jmol/java default memory. The methods [[#Displaying Only Alpha Carbon Atoms|explained below]] will enable you to visualize an asymmetric unit as alpha carbon atoms only (31,668 atoms). First, [[Jmol/Application|run the Jmol application]]. Now try these commands (the load command will take about a full minute): | The rat liver vault needed to be split into 3 PDB files: [[2zuo]], [[2zv4]], and [[2zv5]]. Each file contains 80,652 atoms in 13 chains, for a total in the asymmetric unit of 241,956 atoms in 39 chains (A-Z, a-m). The biological unit contains 2 asymmetric units. Fortunately, the authors provide [http://www.protein.osaka-u.ac.jp/olabb/tsukihara/mvp/index.html PDB files containing complete asymmetric units]. However, these are 18 megabyte files, and do not fit in Jmol/java default memory. The methods [[#Displaying Only Alpha Carbon Atoms|explained below]] will enable you to visualize an asymmetric unit as alpha carbon atoms only (31,668 atoms). First, [[Jmol/Application|run the Jmol application]]. Now try these commands (the load command will take about a full minute): | ||
{{Clear}} | {{Clear}} | ||
Line 25: | Line 36: | ||
===62 Chains=== | ===62 Chains=== | ||
In the | In the final update of the PDB data format specification ([https://www.wwpdb.org/documentation/file-format-content/format33/sect9.html#ATOM Version 3.3]) and the current remediation of PDB data, chain IDs (names) must be single alphanumeric characters (A-Z, a-z, 0-9). This permits a maximum of 62 chains. This limit is not much of a problem for [[asymmetric unit|asymmetric units]]. In January, 2011, there is only one PDB entry with 62 chains ([[2zkr]]), and 4 more with 55-60 chains. | ||
Generally, the first 26 chains are given IDs A-Z. Above 26, it is apparently arbitrary whether numerals or lower case letters are used first. For example, for the 28 chains in [[3krd]] or [[3hln]] or [[3gpt]], those beyond A-Z are 1-2. Alternatively, for the 28 chains in [[3lo3]], the extra two are identified a-b, and in the 42-chain [[3jqo]], lower case ID's are present but no numerals. Also, when numerals are used, they may begin with 1, or with 0 ([[3fic]]). Occasionally, the letters A-Z are not used up before lower case ID's are employed: [[1tzn]] has 28 chains with ID's A-O and a-o. [[7sya]] has 12 chains a-l, with no chains having upper case names. | |||
Jmol can automatically apply a distinct color to each chain, up to 36 chains ([http://jmol.sourceforge.net/jscolors/#Chains Jmol Colors]). However, it can distinguish 62 chains by selection (see [http://chemapps.stolaf.edu/jmol/docs/#setmisc set chainCaseSensitive]). | Jmol can automatically apply a distinct color to each chain, up to 36 chains ([http://jmol.sourceforge.net/jscolors/#Chains Jmol Colors]). However, it can distinguish 62 chains by selection (see [http://chemapps.stolaf.edu/jmol/docs/#setmisc set chainCaseSensitive]). | ||
===Chains Longer Than 10,000 Amino Acids=== | |||
The number of non-hydrogen atoms in the average amino acid in a protein is about 8. Where did this value come from? | |||
<blockquote> | |||
The average molecular weight of an amino acid, weighted by amino acid frequencies in proteins, is 110<ref>Average molecular weight of an amino acid is about 138. When the average is weighted according to the occurrences of amino acids in proteins, it is about 128. Subtracting 18 for the weight of water removed when a peptide bond is formed, the average is 110. This is explained in [http://books.google.com/books?id=5Ek9J4p3NfkC&pg=PA84&lpg=PA84&dq=average+amino+acid+molecular+weight&source=bl&ots=ZxHCOzsnjL&sig=we53bW4b7kLuy53LL2BHNDd8CwU&hl=en&sa=X&ei=SD81UdatGo6B0AHy8oHwDQ&ved=0CFEQ6AEwBA#v=onepage&q=average%20amino%20acid%20molecular%20weight&f=false Lehninger Principles of Biochemistry].</ref>. Half of the atoms in protein are hydrogen<ref>There are approximately 1.01 hydrogens per non-hydrogen atom in proteins. The source of this value is given in the article [[Hydrogen in macromolecular models]].</ref>, and the other half are mostly carbon (12), with some oxygen (16) and nitrogen (14). So if we take 13 as the average weight of a non-hydrogen atom, and average that with 1 for the other 50% of the atoms (hydrogen), we get (13 + 1)/2 = 7 as the approximate molecular weight of the average atom in protein. 110/7 is about 16 atoms for the average amino acid in protein. But half of those are hydrogen, missing from most PDB files. So the number of non-hydrogen atoms in the average amino acid is about 8. | |||
</blockquote> | |||
Since the maximum number of atoms in a single model in a PDB file is 99,999 (see above), dividing by 8 non-hydrogen atoms per amino acid gives a '''maximum of about 12,500 amino acids in a single model in a single PDB file''' (containing nothing but protein and no hydrogen atoms). In fact, longer chains can be represented if only the alpha-carbon atoms are present in the PDB file. | |||
The PDB files containing the longest chains are listed at [[Believe It or Not!]]. | |||
==Multiple Model Files== | ==Multiple Model Files== | ||
The largest PDB files in the [[Protein Data Bank]] are those containing multiple models of large molecules. Since the atom serial numbers start at 1 in each model, these files can get very large (>1,000,000 atoms is possible). An example is [[3ezb]], which contains 40 models (determined by solution [[NMR]]). Each model contains 5,323 atoms (including 2,694 hydrogen atoms); the 40 model file contains 212,920 atoms, and the PDB file is 16.5 megabytes in size. When you visit the page [[3ezb]], the ensemble will fail to display, producing an "out of memory" error (unless you have [http://wiki.jmol.org/index.php/Jmol_Applet#Giving_JmolApplet_more_memory_to_work_with allocated more than the default amount of memory to java] on your computer). There are files in the [[PDB]] several-fold larger than [[3ezb]]. For example, [[2hyn]] is a 64 megabyte file containing 826,896 atoms in 184 models. | The largest PDB files in the [[Protein Data Bank]] are those containing multiple models of large molecules. Since the atom serial numbers start at 1 in each model, these files can get very large (>1,000,000 atoms is possible). An example is [[3ezb]], which contains 40 models (determined by solution [[NMR]]). Each model contains 5,323 atoms (including 2,694 hydrogen atoms); the 40 model file contains 212,920 atoms, and the PDB file is 16.5 megabytes in size. When you visit the page [[3ezb]], the ensemble will fail to display, producing an "out of memory" error (unless you have [http://wiki.jmol.org/index.php/Jmol_Applet#Giving_JmolApplet_more_memory_to_work_with allocated more than the default amount of memory to java] on your computer). | ||
There are files in the [[PDB]] several-fold larger than [[3ezb]]. For example, [[2hyn]] is a 64 megabyte file containing 826,896 atoms in 184 models. | |||
In January, 2011, the largest PDB file in the [[Protein Data Bank]] is [[2ku2]], containing nearly one million atoms, with a file size of 100 megabytes. It consists of fifty models (determined by solution [[NMR]]), each of which has seven chains and nearly 26,000 atoms. | |||
===Displaying Only The First Model=== | ===Displaying Only The First Model=== | ||
Line 42: | Line 65: | ||
*'''Demonstrate ''Out Of Memory''''': Type the following command into the white console window: | *'''Demonstrate ''Out Of Memory''''': Type the following command into the white console window: | ||
:<tt>load =2hyn</tt> | :<tt>load =2hyn</tt> | ||
:The equal sign tells Jmol to obtain the PDB file from the [[Protein Data Bank]]. A red "OutOfMemory" error message should appear in Jmol in less than 30 seconds (depending on the speed of | :The equal sign tells Jmol to obtain the PDB file from the [[Protein Data Bank]]. A red "OutOfMemory" error message should appear in Jmol in less than 30 seconds (depending on the speed of your Internet connection). | ||
*'''Load The First Model''': Type the following two commands into the white console window: | *'''Load The First Model''': Type the following two commands into the white console window: | ||
Line 57: | Line 80: | ||
===Displaying Only Alpha Carbon Atoms=== | ===Displaying Only Alpha Carbon Atoms=== | ||
With large multiple-chain assemblies, or multiple-model ensembles, typically you want to see only the backbone traces. Backbone traces can be visualized from only the alpha carbon atoms (or for nucleic acids, the phosphorus atoms). Jmol can extract ("filter") specified atoms from the PDB file, thereby saving memory. For example, 2hyn contains 4,494 atoms/model (half of which are hydrogen atoms), and 184 models, | With large multiple-chain assemblies, or multiple-model ensembles, typically you want to see only the backbone traces. Backbone traces can be visualized from only the alpha carbon atoms (or for nucleic acids, the phosphorus atoms). There are several methods for discarding all atoms except alpha carbons, listed under [[Help:Uploading_molecules#Additional_considerations_for_large_files]]. Below, we will describe the use of the Jmol application to do this. | ||
Jmol can extract ("filter") specified atoms from the PDB file, thereby saving memory. For example, 2hyn contains 4,494 atoms/model (half of which are hydrogen atoms), and 184 models, totaling 826,896 atoms. There are 260 alpha carbon atoms/model, or a total of 47,840 atoms. The alpha carbons represent less than 6% of the original atoms, or a nearly 20-fold reduction in memory requirements. | |||
Using the '''Jmol application''' from your working folder (see [[Jmol/Application|'''instructions''']]), enter this command: | Using the '''Jmol application''' from your working folder (see [[Jmol/Application|'''instructions''']]), enter this command: | ||
Line 66: | Line 91: | ||
If you wish, you can save the alpha-carbon atom models: | If you wish, you can save the alpha-carbon atom models: | ||
:<tt>write pdb 2hyn_ca_only.pdb</tt> | :<tt>write pdb 2hyn_ca_only.pdb</tt> | ||
This file could be uploaded to Proteopedia for use in the [[SAT]]. | This file could be uploaded to Proteopedia for use in the [[SAT]]. [[Help:Uploading molecules|Here are '''instructions'''.]] | ||
If your molecule contains nucleic acid, you will also want the nucleic backbone traces. The command (for [[PDB code]] 2o5i, 52,717 atoms including protein, DNA and RNA) is | If your molecule contains nucleic acid, you will also want the nucleic backbone traces. The command (for [[PDB code]] 2o5i, 52,717 atoms including protein, DNA and RNA) is | ||
Line 76: | Line 101: | ||
Suppose that you want the alpha carbons for a subset of models in the published ensemble. You can get 16 models from the 184 models in 2hyn by taking either the first 16 | Suppose that you want the alpha carbons for a subset of models in the published ensemble. You can get 16 models from the 184 models in 2hyn by taking either the first 16 | ||
:<tt>load models {1 16 1} =2hyn filter "*.ca"</tt> | :<tt>load models {1 16 1} =2hyn filter "*.ca"</tt> | ||
or by taking every 12th model plus the last model | |||
:<tt>load models {1 184 12} =2hyn filter "*.ca"</tt> | :<tt>load models {1 184 12} =2hyn filter "*.ca"</tt> | ||
Line 104: | Line 129: | ||
A smaller virus capsid is human hepatitis B, [[2g33]]. Its half-capsid (17,460 atoms) is displayed near the top of this page. | A smaller virus capsid is human hepatitis B, [[2g33]]. Its half-capsid (17,460 atoms) is displayed near the top of this page. | ||
===Non-Capsid Biological Assemblies=== | |||
<font color="red">This section is incomplete and remains under construction. [[User:Eric Martz|Eric Martz]] 00:52, 3 January 2011 (IST) </font> | <font color="red">This section is incomplete and remains under construction. [[User:Eric Martz|Eric Martz]] 00:52, 3 January 2011 (IST) </font> | ||
==References== | |||
<references /> |