User:Eric Martz/Introduction to Structural Bioinformatics I
How to find, visualize, and understand 3D protein molecular structures
by Eric Martz, October 2 and 4, 2012
for Prof. Steven Sandler's course Microbiology 565: Laboratory in Molecular Genetics
University of Massachusetts, Amherst MA USA
Get here with 565.MolviZ.Org
I. Computer Lab Preparation (BCRC)I. Computer Lab Preparation (BCRC)
- Log in
- Run Firefox
- Go to proteopedia.org (do NOT type www).
- If you see inactive plug-in, click on it and ENABLE the plug-in.
- Restart the browser and again go to proteopedia.org.
- When you see a rotating 3D molecular structure, you are prepared.
- Take a look around Proteopedia. Click on the PDB codes below, or the Random links to see other molecules.
II. Protein Structure and Structural BioinformaticsII. Protein Structure and Structural Bioinformatics
- 1. Amino acid sequence + protein chain conformation = protein function.
- A. Conformation can be a stable fold or intrinsically unstructured. Both commonly exist in the same protein molecule.
- B. Conformation is specified by sequence.
- Folded domains fold spontaneously (Anfinson, 1960's[1]), or with the help of chaperonins.
- The denaturation (unfolding) of a folded domain destroys its function.
- 2. Structure Knowledge.
- A. Although sequence specifies fold, scientists cannot yet predict the fold from the sequence. Therefore, fold must be determined by empirical (experimental) methods. The most common methods for determining the 3D structure of a protein molecule are:
- X-ray crystallography, 88%.
- Nuclear magnetic resonance (NMR) in aqueous solution, 11%.
- NMR is limited to small proteins (30 kD or smaller).
- High resolution cryo-electron microscopy, 0.5%.
- A. Although sequence specifies fold, scientists cannot yet predict the fold from the sequence. Therefore, fold must be determined by empirical (experimental) methods. The most common methods for determining the 3D structure of a protein molecule are:
- B. These methods are difficult and expensive. Less than 10% of proteins have known structure.
- C. All published, empirically determined 3D macromolecular structure models are available from the Protein Data Bank (PDB; pdb.org; About the PDB).
- E. Crystallographers publish the asymmetric unit of the crystal. It may be identical with the biological unit (the functional form of the molecule), or it may be only part of the biological unit, or it may contain multiple copies of the biological unit. See examples.
- Interchain contacts that occur in the asymmetric unit that are absent in the biological unit are an artifact of crystallization, termed crystal contacts.
- E. Crystallographers publish the asymmetric unit of the crystal. It may be identical with the biological unit (the functional form of the molecule), or it may be only part of the biological unit, or it may contain multiple copies of the biological unit. See examples.
III. Choose a Molecule to ExploreIII. Choose a Molecule to Explore
- Choose a molecule that includes protein and ligand. It may also include nucleic acid, but must have protein and ligand.
- Be sure to note the 4-character PDB code of the molecule you choose. The PDB code makes it easy to retrieve the molecule and information about it. Here are some ways to find a protein with known structure:
- Atlas of Macromolecules (Atlas.MolviZ.Org). Choose a "straightforward" molecule that has ligand.
- Structural View of Biology at the PDB.
- Molecule of the Month at the PDB.
- Topic Pages in Proteopedia.
- Random PDB Entry in Proteopedia (see random box at top left of this page).
- Search by molecule name or amino acid sequence at www.pdb.org, but remember that less than 10% of proteins have known structure.
IV. Explore Your MoleculeIV. Explore Your Molecule
1. Start in Proteopedia1. Start in Proteopedia
Open Proteopedia in a new browser tab and enter your PDB code in the search slot at the left. We will use the following information offered by Proteopedia:
- A. The title of the study, which usually includes the name of the molecule.
- B. The abstract of the publication about this structure, which usually mentions the function of the molecule if known.
- C. The number of polymer chains under About this Structure.
- D. Full names of ligands and non-standard residues (displayed when their green links are clicked beneath the molecule). Example: 2src.
- E. Evolutionary conservation.
- F. The popup button for enlarging the molecular scene.
- G. A link to display the molecule in FirstGlance in Jmol (in the Resources block under the molecule).
2. Continue in FirstGlance in Jmol2. Continue in FirstGlance in Jmol
In Proteopedia, use the link to FirstGlance in the Resources block under the molecule to display your molecule in FirstGlance in Jmol.
Try out the first six views (links) at the upper left, and any other controls that interest you. In particular, we will use these capabilities of FirstGlance in the Powerpoint report:
A. Hydrophobic/PolarA. Hydrophobic/Polar
- Water-soluble proteins have polar/charged amino acids nearly everywhere on their surfaces (Examples: small 2hhd, large 1igy). Patches of hydrophobic amino acids on the surfaces of soluble proteins are usually less than ~10 å in their smaller diameter, and usually recessed.
- Hydrophobic surface patches may be buried in chain-to-chain contacts -- check the biological unit (example: lac repressor homodimer).
- Large, protruding hydrophobic surface areas (>25 Å in their smaller diameter) may indicate transmembrane proteins (insoluble; example: 1bl8).
B. ChargeB. Charge
Most proteins have roughly equal numbers of positive and negative charges intermixed on their surfaces. Surface patches of exclusively positive charge often bind nucleic acids (negatively charged because of their phosphates). For example, examine the protein surface charges where the gal4 transcriptional regulator binds DNA (1d66).
V. Powerpoint ReportV. Powerpoint Report
Save your report with the filename yourLastName-565.pptx, for example sandler-565.pptx. When completed, your Powerpoint report is to be emailed to emartz@microbio.umass.edu for grading.
Each slide MUST be labeled at the top with its section number, e.g. Section 1.
Each question below may be answered in a single slide, or multiple slides. For example, Section # is complicated, so you might have the answer in two slides, labeled Section 1A and Section 1B.
This is not a test. It is to help you learn by doing. Ask for help!
Section 1: IdentitySection 1: Identity
- The label Section 1 at the top (and so forth for every slide).
- Your name.
- Your major; grad students, give the name of your grad program (Micro, MCB, etc.) and whose lab you work in.
- Your PDB identification code.
- The name of your molecule.
- The function of your molecule.
- The resolution or number of models (given in Proteopedia immediately under the molecule). The experimental method used to determine the structure.
- A resolution usually implies that the method is X-ray crystallography.
- A number of models usually implies that the method is NMR.
- To double check, in Proteopedia, click on the link RCSB and at the RCSB PDB, look in the box at the lower right, Experimental Details.
- The number of polymer chains (protein or nucleic acid) present. (Given in Proteopedia in the section About this Structure.)
- A snapshot of your molecule. (See instructions for taking static snapshots, also linked at the bottom left in FirstGlance.)
Section 2: Ligands and Non-Standard ResiduesSection 2: Ligands and Non-Standard Residues
Give the 3-letter abbreviations and full names for all ligands and non-standard residues. If none, so state. (Standard residues)
- Proteopedia lists the 1 to 3-letter abbreviations for each ligand and non-standard residue in green links under the molecule. Click on each one to see its full name shown in red at the bottom of the molecule.
Section 3: Evolutionary ConservationSection 3: Evolutionary Conservation
Does your molecule have a highly conserved region? If so, what is its function? If there is no highly conserved region, is there a highly variable region? Show a snapshot illustrating a highly conserved (or variable) region.
- Click on Evolutionary Conservation in Proteopedia. Toggle the quality button to high quality. Use the popup button to enlarge the high quality image.
Section 4: Hydrophobic/PolarSection 4: Hydrophobic/Polar
Do you think your molecule is water soluble? Support your conclusion with a snapshot.
Section 5: ChargeSection 5: Charge
Are there any areas on the surface of your molecule with only positive (or negative) charges? Show a snapshot illustrating your conclusions.
Section 6: Biological UnitSection 6: Biological Unit
How many polymer chains (protein, DNA or RNA) are in the biological unit? The asymmetric unit?
- A. The asymmetric unit is what you see in Proteopedia or FirstGlance, when you use the PDB code.
- B. In a new browser tab, go to MakeMultimer.
- C. Enter your PDB code. Leave all other options at their defaults. Click Submit.
- D. Pay attention to the tables, especially the "Chain" column (model made by MakeMultimer), vs. the "original" column (original chain names).
- E. Click "View in FirstGlance".
Show side-by-side two snapshots comparing the asymmetric unit with the biological unit. The Cartoon representation in FirstGlance is best for these snapshots. Make sure to label which is which.
Section 7: Animation from Polyview-3DSection 7: Animation from Polyview-3D
Section 8 - Optional: Contacts/Non-covalent BondsSection 8 - Optional: Contacts/Non-covalent Bonds
VI. See AlsoVI. See Also
- User:Eric Martz/Introduction to Structural Bioinformatics, a list of courses and workshops at various levels.
VII. Notes and ReferencesVII. Notes and References
- ↑ For a brief overview of Anfinson's protein folding experiments in the 1960's, see the first paragraph at Intrinsically Disordered Protein.