Structure superposition tools
Structural alignment refers to the alignment, in three dimensions, between two or more molecular models. In the case of proteins, this is usually performed without reference to the sequences of the proteins. This may suggest evolutionary and functional relationships that are not discernable from sequence comparisions[1].
The purpose of this article is to help in choosing a server or software package for performing structural alignment. Characteristics of structural alignment servers and software packages are listed, along with results of testing with a few examples.
Wikipedia offers a list of structural alignment software packages and an overview of structural alignment. Hasegawa and Holm reviewed structural alignment methods in 2009[2].
Evaluating Structural AlignmentsEvaluating Structural Alignments
The structural differences between two optimally aligned models are usually measured as the Root Mean Square Deviation (RMSD) between the aligned alpha-carbon positions (excluding deviations from the non-aligned positions). To provide a frame of reference for RMSD values, note that up to 0.5 Å RMSD of alpha carbons occurs in independent determinations of the same protein[3]. Crystallographic models of proteins with about 50% sequence identity differ by about 1 Å RMSD[3][4]. Deviations can be much larger for models determined by NMR[4].
The statistical significance of a structural alignment, relative to an alignment of random sequence-nonredundant structures in the PDB, is usually measured with a z-score. The z-score is the distance, in standard deviations, between the observed alignment RMSD and the mean RMSD for random pairs of the same length, with the same or fewer gaps. Z-scores less than 2 are considered to lack statistical significance.
Visualizing Structural AlignmentsVisualizing Structural Alignments
Structural alignments are usually visualized as the superimposed backbone traces of the aligned models. The example at right shows the bacterial cell division protein FtsZ (1fsz:A) aligned by Dali with mammalian tubulin (1tub:A). Sequence identity in the structurally aligned regions is about 13%.
- The non-aligned segments are white in the query (FtsZ) and thin in the target (tubulin). This scene is available in Dali except that the target color has been changed to make it more distinct from the red query. (.)
- Because the alignment is about 300 residues long (and the protein chains are longer), it is hard to see details of this alignment in the complexity. Buttons below show 50-residue segments of the query (FtsZ) and backbone for target (tubulin) where the target α carbons are within 3.5 Å. (The RMSD for this Dali alignment is 3.2 Å.)
- This was generated by FATCAT, which reported 3.02 Å RMSD for 298 structurally aligned residues, and 10.2% sequence identity for the structurally aligned residues. The morph shows the 334-residue sequence of the query (FtsZ) changing from the query conformation to the conformation of the aligned target (tubulin). It does not show the non-aligned loops of tubulin that can be seen as thin backbone traces in the initial scene above. The morph makes it easy to see that the core fold is stable, while the larger changes occur in surface loops.
It is very helpful to color the target alpha carbons by deviation ("RMSD") from the query model: red indicates large deviations (poor alignment) while blue indicates small deviations (good alignment), with white indicating average alignment. The stand-alone programs DeepView = Swiss-PDBViewer and PyMOL color alignments by RMSD but the results cannot be easily exported to Jmol. Surprisingly, none of the servers listed below color their alignments by deviation. Unfortunately, I found NO way to color the alignment by RMSD in Jmol.
ConclusionsConclusions
Protein structural alignmentProtein structural alignment
There are several well-documented, easy to use servers and software packages that do an excellent job of sequence-independent structural alignment, described below. They include
- CE rigid alignment only (see Note below).
- Dali rigid alignment only. Jmol. Colors by structure conservation distinguishing aligned from non-aligned segments.
- FATCAT flexible and rigid alignment. Jmol. Generates morph of alignment.
Note: Although the CE server appears not to be well maintained, the CE algorithms can be used at pdb.org either directly in their website, or via their java web start application (see instructions below under Calculate Structure Alignment). Both of the latter methods include visualization in Jmol (not available at the CE website).
If you want automated selection of a small subdomain with the best possible alignment, try DeepView = Swiss-PDBViewer's Explore Domain Alternate Fits or Iterative Magic Fit (see results in the DeepView = Swiss-PDBViewer example).
DNA structural alignmentDNA structural alignment
None of the above servers does structural alignment of DNA, but DeepView = Swiss-PDBViewer does (Iterative Magic Fit).
Multiple chain structural alignmentMultiple chain structural alignment
CE and FATCAT align only one chain at a time. DaliLite appears to align multiple chains, although the output is confusing and not clearly labeled. DeepView = Swiss-PDBViewer can align multiple chains, including both protein and DNA chains in a single alignment (Iterative Magic Fit).
Structural Alignment ServersStructural Alignment Servers
Alphabetical, by server name:
CECE
The Combinatorial Extension structural alignment server. See the explanation of CE methodology at Wikipedia.
- Server: CE Home Page.
- Publication (1998)[5]
- N.B. Database of structure neighbors has not been updated since 2004. Java applet for viewing results is not working in Sept. 2010. Finding structure neighbors from the entire PDB database ("ALL") appears to have been broken since 2001.
- Rigid alignment: ONLY (according to FATCAT[6])
- Align DNA? NO.
- Align multiple protein chains? NO. Aligns one pair of chains at a time.
- Structure-based sequence alignment: YES.
- Visualization: does not appear to work on the website, but visualization in Jmol works when the jCE option is used in the Calculate Structural Alignment java software (see below).
- Offered by RCSB? YES.
DaliDali
"Dali does not optimize RMSD, it matches contacts" (Dali Tutorial, section 4.4.2). See the explanation of Dali's methods at Wikipedia.
- Server: Dali Server
- Publication (2010)[7]
- Help on server: YES, including an extensive Dali Tutorial (PDF) with many screenshots. This refers to a Dali Manual which I could not find.
- Does the structural alignment involve sequence comparison? UNCLEAR.
- Rigid alignment: YES (section 4.4.2 of the Dali Tutorial). ONLY (according to FATCAT[6])
- Flexible alignment: NO.
- Structure neighbors (pre-calculated): YES
- Align DNA? NO.
- Align multiple protein chains? YES? (Results are inadequately labeled and confusing.)
- Pairwise alignment including uploaded models: YES
- Ligands: KEPT.
- Visualization: Jmol.
- Color by deviation: NO.
- Offered by RCSB? NO.
- Special features:
- Colors 3D visualization in Jmol by sequence conservation, calculated from the checked models.
- Colors 3D visualization in Jmol by structure conservation (red for aligned portions, white for unaligned portions).
FATCATFATCAT
- Server: fatcat.burnham.org Flexible structure AlignmenT by Chaining AFPs (Aligned Fragment Pairs) with Twists (FATCAT)
- Publication (2003)[6] "... the FATCAT algorithm achieves more accurate structure alignments than current methods, while at the same time introducing fewer hinges."
- Help on server: YES with snapshots; some context-sensitive help.
- Does alignment involve sequence comparison? UNCLEAR.
- Rigid alignment: YES (optional)
- Flexible alignment: YES (optional)
- Structure neighbors (pre-calculated): YES
- Pairwise alignment including uploaded models: YES
- Align DNA: NO.
- Align multiple protein chains: NO. Aligns a single pair of chains at a time.
- Structure-based sequence alignment: YES
- Visualization: Jmol or Chime. See Special features.
- Color by deviation: NO. (Colors identify twist/hinge boundaries.)
- Offered by RCSB? YES
- Special features:
- Produces a morph between the two aligned chains (at the link "Interpolating between ...").
- Offers a RasMol script to color each rigid segment distinctly (separated by twists/hinges).
Notes from the publication: With 10 "difficult examples"[8] FATCAT produced results comparable (length, RMSD) to the rigid alignment servers DALI, VAST, CE with no twists in 8 cases. This shows that FATCAT is not biased to introduce twists (hinges). Hinges were introduced in two of the difficult cases, producing arguably better alignments. In a comparison with FlexProt[9], FATCAT obtained similar RMSD's and aligned lengths with fewer twists (hinges).
FlexProtFlexProt
- Server: FlexProt.
- Publication (2002)[10]
- Rigid alignment: YES (Results include alignment for 0 hinges, but only a well-aligning subset of residues are aligned.)
- Flexible alignment: YES (Results are given for various numbers of hinges.)
- Visualization: NONE (You can download PDB files.)
- Ligands: Discarded.
- Special features: Assigns a distinct chain name to each rigid segment separated by a hinge, facilitating informative coloring.
Note: FATCAT provides evidence that it out-performs FlexProt.
MAMMOTHMAMMOTH
- Server: mammoth MAMMOTH (MAtching Molecular Models Obtained from THeory)
- Publication (2002)[11]
- Help on server: Little or none.
- Does alignment involve sequence comparison? NO: They state that this is a "sequence-independent structural alignment".
- Rigid alignment: YES.
- Flexible alignment: NO.
- Multiple alignment: YES.
- Structure neighbors (pre-calculated): NO.
- Pairwise alignment including uploaded models: YES
- Visualization: None (you can download a PDB file and a RasMol script. PDB file lacks MODEL/ENDMDL delimiters. PDB file has no chain names. There is a PDB file with chains A and B in the downloadable file rasmol.tcl but this is not a Jmol-ready file.)
- Color by deviation: NO.
- Offered by RCSB? YES
TM-AlignTM-Align
- Server: TM-align
- Publication (2005)[11]
- Help on server: A little.
- Does alignment involve sequence comparison? UNCLEAR.
- Rigid alignment: YES.
- Flexible alignment: NO.
- Multiple alignment: You can download the software to run on linux.
- Structure neighbors (pre-calculated): NO.
- Pairwise alignment including uploaded models: YES
- Visualization: None (you can download a script for RasMol that contains PDB coordinates. PDB file lacks MODEL/ENDMDL delimiters. PDB file has no chain names. File does not run as a script in Jmol due to REMARK lines that are not legal Jmol commands.)
- Color by deviation: NO.
- Offered by RCSB? YES
TopMatchTopMatch
- Server: TopMatch
- Publications (both 2008)[12][13]
- Help on server: YES.
- Does alignment involve sequence comparison? UNCLEAR.
- Rigid alignment: YES.
- Flexible alignment: NO.
- Multiple alignment: NO.
- Structure neighbors (pre-calculated): NO.
- Pairwise alignment including uploaded models: YES
- Visualization: Jmol.
- Color by deviation: NO.
- Offered by RCSB? YES
- Special features: You can download the aligned target PDB file (in a separate file from the query PDB file). A PyMOL script is also available.
VASTVAST
- Server: Vector Alignment Search Tool
- Publication (1996)[12]
- Help on server: YES.
- Does alignment involve sequence comparison? UNCLEAR.
- Rigid alignment: ONLY (according to FATCAT[6])
- Flexible alignment: NO.
- Multiple alignment: NO.
- Structure neighbors (pre-calculated): YES.
- Pairwise alignment including uploaded models: NO.
- Visualization: Cn3D. There appears to be no way to download the aligned model in PDB format for visualization in Jmol.
- Color by deviation: NO (at least not in Jmol-compatible form).
- Offered by RCSB? NO.
- Special features:
Note: In order to get alignment parameters such as RMSD, you must change the list format from graphics to table, then click the List button.
Structural Alignment SoftwareStructural Alignment Software
This section is for stand-alone software packages that do not require a web browser.
Calculate Structure AlignmentCalculate Structure Alignment
This is a java program (java web start) offered by the U.S. PDB. At the PDB website, look for the box of Tools on the left hand side, and click on Compare Structures. You will then get a form where you can enter two PDB codes (or upload two PDB files), optionally with a sequence range for each. Alternatively, with the Database Search option, you can enter a single PDB code (or upload a PDB file), and find structure neighbors. On the right is a link "Align custom files (Launches a Java Web Start application)", which starts the Calculate Structure Alignment java software. This package offers java implementations of the CE and FATCAT (you can choose rigid or flexible) algorithms (see above).
- Pairwise Comparison: displays the alignment in Jmol, and a sequence alignment (presumably structure-based).
- Database Search: I got no results after clicking "Align" for either jCE or jFATCAT - rigid. These options did not appear to be working. Eric Martz 16:28, 4 October 2010 (IST)
- Help is minimal and results are not clearly labeled.
DeepView = Swiss-PDBViewerDeepView = Swiss-PDBViewer
- Download site: DeepView Swiss-PdbViewer.
- Publications (1997, 1999)[14][15]
- Version 4.01 released in 2008.
- Caution: This program often reports the wrong number of alpha carbons aligned, typically reporting twice or four-times the actual number. In order to get the correct count, use the Fit menu, Calculate RMS, or observe the number of residues selected in each layer.
- Help: YES.
- Fit, Magic Fit does a sequence-based structural alignment.
- Fit, Iterative Magic Fit starts with a sequence-based structural alignment, then does further structural alignment, minimizing the RMSD.
- Fit, Explore Domain Alternate Fits: does a sequence-independent structural alignment.
- Magic Fit and Iterative Magic Fit can align multiple chains in each model and can align DNA chains as well as protein chains.
- Color, RMS: colors the target structure by deviation.
- Fit, Set Layer Std Dev. into B-factors: works only when the sequences of the aligned models are identical.
PyMOLPyMOL
ExamplesExamples
Example Requiring FlexibilityExample Requiring Flexibility
This example requires flexibility for a good alignment: 2bbm:A vs. 1cfc:A. Length: 148. 97% sequence identity (145/148), 99% similar. These files contain calmodulin. In 2bbm (Drosophila), the two calcium-binding domains are wrapped around a peptide. In 1cfc (Xenopus), there is no calcium and no peptide, and the linker between the two domains is flexible.
- CE:
- 4.8 Å RMSD.
- 38.5% sequence identity in structure-based sequence alignment. Aligned/gap positions = 109/47.
- Uses old, unremediated PDB files (1cfc has no chain A).
- FATCAT:
- 5 hinges(twists): 140 residues aligned, RMSD 2.08 Å.
- FlexProt:
- 0 hinges: 49 residues aligned, RMSD 2.94 Å.
- 1 hinge: 84 residues aligned, RMSD 2.97 Å.
- 2 hinges: 102 residues aligned, RMSD 2.82 Å.
- 3 hinges: 118 residues aligned, RMSD 2.60 Å.
- 4 hinges: 134 residues aligned, RMSD 2.62 Å.
Examples for Rigid AlignmentExamples for Rigid Alignment
Example 1 | |||
Tool | Residues Aligned | RMSD, Å | Unaligned Residues /Total |
CE | 305 | 3.2 | 96/401 |
Dali | 299 | 3.2 | Not Reported |
DeepView | 159 64 | 1.69 1.0 | |
FATCAT | 298 | 3.02 | 103/401 |
MAMMOTH | 298? | 4.0? | |
PyMOL | 197 | 4.5 | |
TM-align | 312 | 3.4 | |
TopMatch | 251 | 3.1 | |
VAST | 299 | 4.0 | |
Example 2 | |||
Tool | Residues Aligned | RMSD, Å | Unaligned Residues |
CE | 404 | 1.34 | 56/"460"(?) |
Dali | 420 | 1.6 | 20/? |
DeepView | 389 | 1.11 | 22/437 |
FATCAT | 424 | 1.75 | 16 |
PyMOL | 325 | 0.90 |
1fsz is the bacterial cell division protein FtsZ, length 334 residues with coordinates (372 in crystallized protein). It has structural similarity to mammalian tubulin[16][17] found in 1tub chain A, length 440. However, the sequence identity is low. 92/372 residues can be aligned with 19% identity (2 gaps), and another 14 residue stretch with 42% identity (no gaps).
CE exampleCE example
- 3.2 Å RMSD for 305 residues. The structural alignment has 96 unaligned "gap" residues: one large gap of ~30 residues, and ten smaller gaps of 8 residues or less.
- Z-score: 6.5.
- 12.5% sequence identity within the structural alignment.
- Same results obtained at either the CE website, or using the Calculate Structure Alignment java webstart software (see above).
Dali exampleDali example
- 3.2 Å RMSD RIGID alignment included 299 residues.
- Z-score: 25.5.
- 13% sequence identity for the structurally aligned regions.
- The structure-based sequence alignment has many gaps.
DeepView = Swiss-PDBViewer exampleDeepView = Swiss-PDBViewer example
Tested with version 4.01 OS X.
- Magic Fit -- SEQUENCED-BASED:
- 4.4 Å RMSD for 114 aligned residues.
- Iterative Magic Fit -- Sequence based followed by RMSD minimization:
- 1.69 Å RMSD for 159 aligned residues.
- Explore Domain Alternate Fits -- sequence-independent alignment:
- Used option NOT to use selected residues.
- Nevertheless program complained repeatedly that I had not selected residues.
- Nevertheless program produced an alignment:
- 1.0 Å for 64 aligned residues.
FATCAT exampleFATCAT example
- 3.02 Å RMSD RIGID alignment includes 298 residues.
- P value: 5 x 10-8 (used instead of z-score to take twists into account).
- 10.2% sequence identity in the structurally aligned regions.
- The structure-based sequence alignment has many gaps, looking similar to that generated by CE.
- FLEXIBLE alignment introduced ZERO twists (hinges), so gave the same result as the rigid alignment.
MAMMOTH exampleMAMMOTH example
- 4.0 Å (?) with 298 aligned residues (?) (Labeling in results is unclear.)
- Structure-based sequence alignment is displayed.
PyMOL examplePyMOL example
- Command: super 1fsz////CA, 1tub_a////CA, object=supAB
- 4.5 Å RMSD for 197 aligned residues.
TM-Align exampleTM-Align example
- 3.42 Å for 312 aligned residues.
- Structure-based sequence alignment is displayed.
TopMatch exampleTopMatch example
- Error # 1063, no explanation. No structures displayed in Jmol. Result displayed nevertheless:
- 3.1 Å RMSD. Alignment includes 251 residues.
- 12% sequence identity in the aligned regions.
- Tried the example requiring flexibility (above) as a second case. A 40 residue subdomain was aligned with RMSD 1.8 Å, and the alignment was displayed in Jmol with no error.
VAST exampleVAST example
- 4.0 Å RMSD for 299 aligned residues.
- Expectation value: 10-16.
- 11.4% sequence identity in the aligned segments.
- I could find no way to download the aligned PDB file for visualization in Jmol or RasMol.
ReferencesReferences
- ↑ Koppensteiner WA, Lackner P, Wiederstein M, Sippl MJ. Characterization of novel proteins based on known protein structures. J Mol Biol. 2000 Mar 3;296(4):1139-52. PMID:10686110 doi:10.1006/jmbi.1999.3501
- ↑ Hasegawa H, Holm L. Advances and pitfalls of protein structural alignment. Curr Opin Struct Biol. 2009 Jun;19(3):341-8. Epub 2009 May 27. PMID:19481444 doi:10.1016/j.sbi.2009.04.003
- ↑ 3.0 3.1 Chothia C, Lesk AM. The relation between the divergence of sequence and structure in proteins. EMBO J. 1986 Apr;5(4):823-6. PMID:3709526
- ↑ 4.0 4.1 Schwede T, Diemand A, Guex N, Peitsch MC. Protein structure computing in the genomic era. Res Microbiol. 2000 Mar;151(2):107-12. PMID:10865955
- ↑ Shindyalov IN, Bourne PE. Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng. 1998 Sep;11(9):739-47. PMID:9796821
- ↑ 6.0 6.1 6.2 6.3 Ye Y, Godzik A. Flexible structure alignment by chaining aligned fragment pairs allowing twists. Bioinformatics. 2003 Oct;19 Suppl 2:ii246-55. PMID:14534198
- ↑ Holm L, Rosenstrom P. Dali server: conservation mapping in 3D. Nucleic Acids Res. 2010 Jul 1;38 Suppl:W545-9. Epub 2010 May 10. PMID:20457744 doi:10.1093/nar/gkq366
- ↑ Fischer,D., Elofsson,A., Rice,D. and Eisenberg,D. (1996) Assessing the performance of fold recognition methods by means of a comprehensive benchmark. In Pacific Symposium on Biocomputing. pp. 300–318.
- ↑ Shatsky M, Nussinov R, Wolfson HJ. Flexible protein alignment and hinge detection. Proteins. 2002 Aug 1;48(2):242-56. PMID:12112693 doi:10.1002/prot.10100
- ↑ Shatsky M, Nussinov R, Wolfson HJ. Flexible protein alignment and hinge detection. Proteins. 2002 Aug 1;48(2):242-56. PMID:12112693 doi:10.1002/prot.10100
- ↑ 11.0 11.1 Ortiz AR, Strauss CE, Olmea O. MAMMOTH (matching molecular models obtained from theory): an automated method for model comparison. Protein Sci. 2002 Nov;11(11):2606-21. PMID:12381844 doi:10.1110/ps.0215902 Cite error: Invalid
<ref>
tag; name "mammoth" defined multiple times with different content - ↑ 12.0 12.1 Sippl MJ, Wiederstein M. A note on difficult structure alignment problems. Bioinformatics. 2008 Feb 1;24(3):426-7. Epub 2008 Jan 2. PMID:18174182 doi:10.1093/bioinformatics/btm622 Cite error: Invalid
<ref>
tag; name "topmatch" defined multiple times with different content - ↑ Sippl MJ. On distance and similarity in fold space. Bioinformatics. 2008 Mar 15;24(6):872-3. Epub 2008 Jan 28. PMID:18227113 doi:10.1093/bioinformatics/btn040
- ↑ Guex N, Peitsch MC. SWISS-MODEL and the Swiss-PdbViewer: an environment for comparative protein modeling. Electrophoresis. 1997 Dec;18(15):2714-23. PMID:9504803 doi:10.1002/elps.1150181505
- ↑ Guex N, Diemand A, Peitsch MC. Protein modelling for all. Trends Biochem Sci. 1999 Sep;24(9):364-7. PMID:10470037
- ↑ Nogales E, Downing KH, Amos LA, Lowe J. Tubulin and FtsZ form a distinct family of GTPases. Nat Struct Biol. 1998 Jun;5(6):451-8. PMID:9628483
- ↑ Makarova KS, Koonin EV. Two new families of the FtsZ-tubulin protein superfamily implicated in membrane remodeling in diverse bacteria and archaea. Biol Direct. 2010 May 7;5:33. PMID:20459678 doi:10.1186/1745-6150-5-33