Biological Unit: Difference between revisions

Wayne Decatur (talk | contribs)
m fix spacing
Eric Martz (talk | contribs)
 
(30 intermediate revisions by 2 users not shown)
Line 8: Line 8:
The ''Biological Unit'', also called the ''Biological Assembly''<ref name="xudunbrackpreprint" /><ref name="xudunbrack" />, is the quaternary structure of a protein that is believed to be the main functional form of the molecule. It can be a single [[chain]], or a quaternary assembly of multiple identical or non-identical [[chains]]. For example, the biological unit of hemoglobin includes two alpha chains and two beta chains, making it a tetrameric α<sub>2</sub>β<sub>2</sub> structure. When a biological unit contains multiple chains that have co-evolved to bind to each other, it may also be referred to as a ''specific oligomer''.
The ''Biological Unit'', also called the ''Biological Assembly''<ref name="xudunbrackpreprint" /><ref name="xudunbrack" />, is the quaternary structure of a protein that is believed to be the main functional form of the molecule. It can be a single [[chain]], or a quaternary assembly of multiple identical or non-identical [[chains]]. For example, the biological unit of hemoglobin includes two alpha chains and two beta chains, making it a tetrameric α<sub>2</sub>β<sub>2</sub> structure. When a biological unit contains multiple chains that have co-evolved to bind to each other, it may also be referred to as a ''specific oligomer''.


Of course, what is the functional form (biological unit) under one set of conditions may change under a different set of conditions, so there may be more than one functional form (biological unit) that includes a given protein chain. For example, phosphorylation or dephosphorylation by protein kinases or phosphatases often change the affinities between proteins, and hence their quaternary assemblies.
Of course, the functional form (biological unit) under one set of conditions may change under a different set of conditions, so there may be more than one functional form (biological unit) that includes a given protein chain. For example, phosphorylation or dephosphorylation by protein kinases or phosphatases often change the affinities between proteins, and hence their quaternary assemblies.


Published macromolecular structure data files ([[Atomic coordinate files]], often in the [[PDB file format]]) contain the [[Asymmetric Unit]], which may be identical with the biological unit, or only a portion of it, or may contain multiple biological units. Interchain contacts that occur in the asymmetric unit that are absent in the biological unit are termed [[crystal contacts]]. When publishing a macromolecular structure, the authors may elect to specify the biological unit. In the [[PDB file format]], this is done in REMARK 350.
Published macromolecular structure data files ([[Atomic coordinate files]], often in the [[PDB file format]]) contain the [[Asymmetric Unit]], which may be identical with the biological unit, or only a portion of it, or may contain multiple biological units. Interchain contacts that occur in the asymmetric unit that are absent in the biological unit are termed [[crystal contacts]]. When publishing a macromolecular structure, the authors may elect to specify the biological unit. In the [[PDB file format]], this is done in REMARK 350.


==Examples==
==Examples==
<center><table cellpadding='2' border='1' class='wikitable'>
<table cellpadding='2' border='1' class='wikitable' align='left' hspace='10'>
<tr><td><center>
<tr><td><center>
Model
Model
Line 64: Line 64:
</center></td></tr>
</center></td></tr>
</table>
</table>
&nbsp;* The contacts in this biological unit differ from those in the asymmetric unit.
&nbsp;* The contacts in this biological unit differ from those in the asymmetric unit.
<br>**The "author specified" assembly (in this case the same as the [[asymmetric unit]]) appears unlikely in view of the assembly predicted by [[#Protein Interfaces, Surfaces and Assemblies Server (PISA)|PISA]], which has a much larger buried surface area.
 
</center>
&nbsp;**The "author specified" assembly (in this case the same as the [[asymmetric unit]]) appears unlikely in view of the assembly predicted by [[#Protein Interfaces, Surfaces and Assemblies Server (PISA)|PISA]], which has a much larger buried surface area.
 
Truncated proteins may form oligomers that are impossible in the native protein. For example, [[1bk5]] (karyopherin alpha) is a truncated part of the natural chain, and forms a dimer that would be prevented by the full-length chain. Dimerization is dependent upon Y397. Mutation Y397D prevents this artifactual dimerization, leading to the monomer [[1ee5]].
Truncated proteins may form oligomers that are impossible in the native protein. For example, [[1bk5]] (karyopherin alpha) is a truncated part of the natural chain, and forms a dimer that would be prevented by the full-length chain. Dimerization is dependent upon Y397. Mutation Y397D prevents this artifactual dimerization, leading to the monomer [[1ee5]].


<br clear="left">
==Visualizing the Biological Unit==
==Visualizing the Biological Unit==
===Proteopedia===
On pages titled with a [[PDB code]], Proteopedia shows biological unit 1 by default, with an option to show the asymmetric unit instead. A simple example is [[3hyd]]: biological unit 1 has 4 small peptide chains, while the asymmetric unit has a single chain. A large example is [[1pov]], the polio virus capsid. When biological unit 1 becomes too large for JSmol to handle effectively, it is displayed in [[Molstar|Mol*]] ("[[Molstar]]") instead of [[JSmol]]. An example is the much larger capsid of Eastern Equine Encephalitis Virus, [[6mx4]].
===FirstGlance in Jmol===
===FirstGlance in Jmol===
FirstGlance in Jmol makes it quick and easy to see the biological unit.
FirstGlance in Jmol makes it quick and easy to see, explore, and analyze the biological unit. The initial display in FirstGlance is automatically "biomolecule 1" from REMARK 350. When it is very large, it will be [http://firstglance.jmol.org/notes.htm#simplification simplified to alpha carbon atoms (or a subset thereof)] automatically by FirstGlance.
#Display the molecule in FirstGlance in Jmol:
*Display the molecule in FirstGlance in Jmol:
##Enter the [[PDB code]] in the top search slot at the left edge of any page in Proteopedia. At the page in Proteopedia titled with the PDB code, under <i>Resources</i>, click on the link to FirstGlance.  
**Enter the [[PDB code]] in the top search slot at the left edge of any page in Proteopedia. At the page in Proteopedia titled with the PDB code, under <i>Resources</i>, click on the link to FirstGlance.  
##Alternatively, go directly to [http://firstglance.jmol.org FirstGlance.Jmol.Org] and enter the PDB code.
**Alternatively, go directly to [http://firstglance.jmol.org http://FirstGlance.Jmol.Org] (not http'''s''') and enter the PDB code.
#In the <i>Molecule Information</i> tab (the first/left-most tab), click <i>Biological Unit</i> and follow instructions.
*In the <i>Molecule Information</i> tab (the first/left-most tab), click <i>Biological Unit</i> and follow instructions. When there is more than one biological unit, all will be listed and you can display and analyze each of them.


===How To Show The Biological Unit In Proteopedia===
===How To Show The Biological Unit In Proteopedia===
Line 85: Line 93:
</center>
</center>


==Sources of Biological Unit Models==
==Unreliability of REMARK 350 in the PDB File Header==
 
===Unreliability of REMARK 350 in the PDB File Header===
When a structure is deposited in the [[PDB]], the authors are required to specify the biological unit if it is known. This is given in REMARK 350 in the header of the [[PDB file format]]. Unfortunately, information in REMARK 350 is often incorrect (see [https://lists.wwpdb.org/empathy/thread/DNQIHDGCX3E64AJT5ABCTNFW2EJZTGAS discussion of this problem by Roland Dunbrack])<ref name="xudunbrackpreprint" /><ref name="xudunbrack" />. There are numerous examples in which the authors state that the biological unit is a monomer in REMARK 350, but provide good experimental evidence in the paper reporting the structure that the biological unit is a dimer. Jose Duarte provided a [https://lists.wwpdb.org/empathy/thread/FNLXQHDEEHGFT7UID3WDYWUFFARTMTN7 list of examples].
When a structure is deposited in the [[PDB]], the authors are required to specify the biological unit if it is known. This is given in REMARK 350 in the header of the [[PDB file format]]. Unfortunately, information in REMARK 350 is often incorrect (see [https://lists.wwpdb.org/empathy/thread/DNQIHDGCX3E64AJT5ABCTNFW2EJZTGAS discussion of this problem by Roland Dunbrack])<ref name="xudunbrackpreprint" /><ref name="xudunbrack" />. There are numerous examples in which the authors state that the biological unit is a monomer in REMARK 350, but provide good experimental evidence in the paper reporting the structure that the biological unit is a dimer. Jose Duarte provided a [https://lists.wwpdb.org/empathy/thread/FNLXQHDEEHGFT7UID3WDYWUFFARTMTN7 list of examples].


Line 101: Line 107:
*When a biological unit is determined ''only by software'', it is less likely to be correct. The software makes an educated guess based upon the characteristics of the contacts present in the protein crystal, but it is sometimes incorrect.
*When a biological unit is determined ''only by software'', it is less likely to be correct. The software makes an educated guess based upon the characteristics of the contacts present in the protein crystal, but it is sometimes incorrect.


===Generation of Biological Unit Models from REMARK 350===
==Generation of Biological Unit Models from REMARK 350==
The following servers generate biological unit models from REMARK 350. Be careful because, as explained above, REMARK 350 is often incorrect.
The following sources generate biological unit models from REMARK 350. Be aware that, as explained above, REMARK 350 may be incorrect.
====MakeMultimer====
 
*The [http://watcut.uwaterloo.ca/makemultimer/ MakeMultimer Server] generates a PDB file in which every chain is assigned a distinct single-character name, and all chains are in a single model. MakeMultimer provides direct links for downloading, or for visualizing each biological unit in [[FirstGlance in Jmol]].
===FirstGlance in Jmol Limitation===
[http://firstglance.jmol.org FirstGlance] automatically '''displays''' biological unit 1, and enables you to work with it using all of the tools within FirstGlance. However, you '''cannot save the biological unit model in PDB format''', because internally
JSmol assigns multiple-character names to duplicated chains. For example, with [[3hyd]], chain A is duplicated to chains A1, A2, A3. These chain names render the pseudo PDB file saved from FirstGlance/JSmol unreadable for many software packages.
 
===MakeMultimer===
*The MakeMultimer server by Michael Palmer (University of Waterloo, Ontario, Canada) served FirstGlance well from May, 2010, until late 2021, when it was retired. It generated a PDB file in which every chain is assigned a distinct single-character name, and all chains are in a single model. [[#FirstGlance in Jmol|FirstGlance 4.0]], released August 15, 2022, was designed to make external generation of a biological unit PDB file unnecessary.


====RCSB====
===RCSB===
*Atomic coordinates for biological units, when specified by the authors of a published structure in REMARK 350 of the [[PDB file format]], are available from the RCSB (US) [[Protein Data Bank]]. As of April, 2010, &quot;Biological Assemblies&quot; were available at the bottom of the list under ''Download Files'' (upper right, near the large [[PDB code]]).
*Atomic coordinates for biological units, when specified by the authors of a published structure in REMARK 350 of the [[PDB file format]], are available from the RCSB (US) [[Protein Data Bank]]. As of April, 2010, &quot;Biological Assemblies&quot; were available at the bottom of the list under ''Download Files'' (upper right, near the large [[PDB code]]).


*One technical problem with the files from RCSB  is that when they contain more than one copy of the asymmetric unit, the duplicated chains all have identical names. RCSB offers visualization of these models in Jmol, but it is usually difficult to tell how many chains are present in the biological unit, either in the snapshot (where each chain is colored similarly in a spectral amino- to carboxy-terminal sequence) or in Jmol, where coloring by chain fails to distinguish chains with the same name. Also, the additional copies are in separate models, which often complicates visualization. In contrast, coordinates for biological units available from MakeMultimer (see above), PISA or PQS (see below) are in a single model, and each chain is given a distinct name. RCSB also offers a viewer named Kiosk but this seems not to show the biological assembly.
*One technical problem with the files from RCSB  is that when they contain more than one copy of the asymmetric unit, the duplicated chains all have identical names. RCSB offers visualization of these models in Jmol, but it is usually difficult to tell how many chains are present in the biological unit, either in the snapshot (where each chain is colored similarly in a spectral amino- to carboxy-terminal sequence) or in Jmol, where coloring by chain fails to distinguish chains with the same name. Also, the additional copies are in separate models, which often complicates visualization. In contrast, coordinates for biological units available from PISA (see below) are in a single model, and each chain is given a distinct name. RCSB also offers a viewer named Kiosk but this seems not to show the biological assembly.


As for author-specified biological assemblies, sometimes the specific oligomers were not known at the time the asymmetric unit was published. Also, some authors may have failed to specify the biological unit even when it was known. Rarely, the specified biological units might be incorrect. For all these reasons, it is advisable to consult other sources in addition to REMARK 350.
As for author-specified biological assemblies, sometimes the specific oligomers were not known at the time the asymmetric unit was published. Also, some authors may have failed to specify the biological unit even when it was known. Rarely, the specified biological units might be incorrect. For all these reasons, it is advisable to consult other sources in addition to REMARK 350.
Line 115: Line 126:
===Software: Protein Interfaces, Surfaces and Assemblies Server (PISA)===
===Software: Protein Interfaces, Surfaces and Assemblies Server (PISA)===


The [http://www.ebi.ac.uk/msd-srv/prot_int/pistart.html Protein Interfaces, Surfaces and Assemblies Server] (PISA) at the European Bioinformatics Institute uses improved methods to predict the biological unit or probable quaternary assembly, compared to its predecessor PQS (see next section). These servers examine the contacts that occur in macromolecular crystals used in [[X-ray crystallography]]. They attempt to discriminate between [[crystal contacts]] (artifacts of crystallization) and contacts between chains that have co-evolved to maintain specific oligomeric binding.
The [http://www.ebi.ac.uk/msd-srv/prot_int/pistart.html Protein Interfaces, Surfaces and Assemblies Server] (PISA) at the European Bioinformatics Institute uses improved methods to predict the biological unit or probable quaternary assembly, compared to its predecessor PQS. It examines the contacts that occur in macromolecular crystals used in [[X-ray crystallography]]. It attempts to discriminate between [[crystal contacts]] (artifacts of crystallization) and contacts between chains that have co-evolved to maintain specific oligomeric binding.


===Software: Evolutionary Protein-Protein Interface Classifier (EPPIC)===
===Software: Evolutionary Protein-Protein Interface Classifier (EPPIC)===
Line 123: Line 134:
===Software: The Protein Common Assembly Database (ProtCAD)===
===Software: The Protein Common Assembly Database (ProtCAD)===


The [http://dunbrack2.fccc.edu/protcad/  Protein Common Assembly Database] (ProtCAD) a comprehensive structural resource of protein complexes based on presenting clusters of protein assembly structures observed in independent experimental structure determinations of homologous proteins in the Protein Data Bank, with the occurrence in multiple experiments providing validation.
The [http://dunbrack2.fccc.edu/protcad/  Protein Common Assembly Database] (ProtCAD) is a database of protein complexes based on structures observed in independent experimental structure determinations of the same or homologous proteins in the Protein Data Bank, with the occurrence in multiple experiments providing validation.


===Software: Probable Quaternary Structure Server (PQS)===
<!--===Software: Probable Quaternary Structure Server (PQS)===


The [http://pqs.ebi.ac.uk Probable Quaternary Structure Server] (PQS) at the European Bioinformatics Institute examines the inter-chain contacts within protein crystals, and makes an educated guess (using published methods) about which contacts represent co-evolved specific oligomeric contacts, and which are artifacts of crystallization. It was usually correct, but not always. It returns models for what it deduces to be the biological units. There are many possible relationships between the asymmetric unit and the biological units returned by PQS. Examples are given in the discussion of [http://proteinexplorer.org/pqs.htm PQS at ProteinExplorer.Org]. Updates to PQS stopped in August, 2009. In 2010 it is being phased out in favor of PISA (see above).
The [http://pqs.ebi.ac.uk Probable Quaternary Structure Server] (PQS) at the European Bioinformatics Institute examines the inter-chain contacts within protein crystals, and makes an educated guess (using published methods) about which contacts represent co-evolved specific oligomeric contacts, and which are artifacts of crystallization. It was usually correct, but not always. It returns models for what it deduces to be the biological units. There are many possible relationships between the asymmetric unit and the biological units returned by PQS. Examples are given in the discussion of [http://proteinexplorer.org/pqs.htm PQS at ProteinExplorer.Org]. Updates to PQS stopped in August, 2009. In 2010 it is being phased out in favor of PISA (see above).-->


==See Also==
==See Also==
Line 139: Line 150:
==Web Sites==
==Web Sites==


*[https://pdb101.rcsb.org/learn/guide-to-understanding-pdb-data/biological-assemblies Introduction to Biological Assemblies and the PDB Archive]
*[http://www.ebi.ac.uk/msd-srv/prot_int/pistart.html Protein Interfaces, Surfaces and Assemblies Server] (PISA) at the European Bioinformatics Institute.
*[http://www.ebi.ac.uk/msd-srv/prot_int/pistart.html Protein Interfaces, Surfaces and Assemblies Server] (PISA) at the European Bioinformatics Institute.
*[http://pqs.ebi.ac.uk Probable Quaternary Structure Server] (PQS) at the European Bioinformatics Institute.
<!--*[http://pqs.ebi.ac.uk Probable Quaternary Structure Server] (PQS) at the European Bioinformatics Institute. Retired since 2009!-->
*[http://dunbrack.fccc.edu/ProtBuD.php ProtBud, a database of biological unit structures] Offers comparisons and downloads of the results from REMARK 350 vs. PQS.
<!-- protBud appears to be retired. In November, 2022, I cannot find it. -Eric Martz
*[https://pdb101.rcsb.org/learn/guide-to-understanding-pdb-data/biological-assemblies Introduction to Biological Assemblies and the PDB Archive]
*[http://dunbrack.fccc.edu/ProtBuD.php ProtBud, a database of biological unit structures] Offers comparisons and downloads of the results from REMARK 350 vs. PQS.-->
*[https://www.molnac.unisa.it/BioTools/cocomaps/ COCOMAPS (bioCOmplexes COntact MAPS)] is a web server for analysis and visualization of the interfaces present in biological complexes, such as protein-protein, protein-DNA and protein-RNA complexes, making use of intermolecular contact maps.
*[https://www.molnac.unisa.it/BioTools/cocomaps/ COCOMAPS (bioCOmplexes COntact MAPS)] is a web server for analysis and visualization of the interfaces present in biological complexes, such as protein-protein, protein-DNA and protein-RNA complexes, making use of intermolecular contact maps.


Proteopedia Page Contributors and Editors (what is this?)Proteopedia Page Contributors and Editors (what is this?)

Eric Martz, Eran Hodis, Wayne Decatur, Jaime Prilusky