Practical Guide to Homology Modeling

From Proteopedia
Jump to navigation Jump to search

TerminologyTerminology

  • Query sequence: The amino acid sequence for which a 3D model is wanted. More commonly called the target sequence, but talking about target vs. template gets confusing.
  • Template: An empirically determined 3D protein structure with significant sequence similarity to the query.
  • Structure will be used in this article to mean three-dimensional structure.

What Is A Homology Model?What Is A Homology Model?

Homology models, also called comparative models, are obtained by folding a query protein sequence (also called the target sequence) to fit an empirically-determined template model. The registration between residues in the query and template is determined by an amino acid sequence alignment between the query and template sequences.

Imagine that the template’s polypeptide backbone is a folded glass tube. Now imagine that the query sequence is a thin metal chain that can be pulled through the tube. The chain (query) will adopt the same fold as the tube (template). The sequence alignment specifies how far the chain should be pulled into the tube; that is, how the residues in the query sequence match up with the structure of the template.

Errors or uncertainties in the sequence alignment result in errors or uncertainties in the homology model. Portions of the query sequence cannot be modeled reliably when there are gaps in the sequence alignment due to insertions/deletions ("indels"), or portions of the template that lack coordinates due to crystallographic disorder. Provided there is sufficient sequence identity between the query and template (at least 30%), the main chain in homology models is usually mostly correct. However, the positions of sidechain rotamers in homology models are usually unreliable.

Nevertheless, homology models are useful for seeing low-resolution features, such as which residues are on the surface or buried, which are close to other features of interest (such as a putative active site), and the overall distribution of charges and evolutionary conservation.

Do you need a homology model?Do you need a homology model?

You don’t need a homology model if the amino acid sequence of interest (the query sequence) already has an empirically determined 3D structure. Structures determined empirically, by X-ray crystallography or (much less often) by solution NMR, will almost always be more accurate than a homology model.

Is there an empirical model?Is there an empirical model?

All published, empirically-determined, atomic-resolution, macromolecular 3D structures are available in the [[[Protein Data Bank]] (PDB, pdb.org).

Each model in the PDB has a unique 4-character identification code (PDB ID) that begins with a numeral, and has letters or numerals for the last 3 characters . Examples are 1d66, 4mdh, 9ins.

Here are two methods for finding out if your query amino acid sequence, or parts of it, have empirically-determined 3D structures in the PDB.

Simple search for empirical models (via PIR)Simple search for empirical models (via PIR)

At UniProt.Org, find your protein and click on Structure.

  • If there is a column labeled “Entry” with 4-character PDB IDs, these are empirical structures for your protein. Pay attention to the “Positions” column, which gives the sequence number range covered by each model.
  • If there is no “Entry” column, then there are no sequence-identical empirical structures for your protein. Then try the Advanced search method below.
  • Some proteins have no Structure section (e.g. K4QDG1_SACBA). Then try the Advanced search method below.

If empirical structures exist, see sections below for guidance on how to explore them. If they are satisfactory, then you don't need a homology model.

Advanced search for empirical models (RCSB PDB)Advanced search for empirical models (RCSB PDB)

This method takes more time but gives you more information. It will find empirical structures that have sequence similarity to the query. Such hits enable a high-quality homology model.

For example, if your query is calmodulin from the lancelet fish (Q9UB37, CALM2_BRALA), zero empirical structures are listed at UniProt. However, the query is 97% sequence identical to human calmodulin (P62158 CALM_HUMAN) and calmodulins from other taxa, for which there are numerous full-length empirical structures. A very high quality homology model can be constructed.

Advanced search procedure:

  1. Copy the FASTA format sequence for your protein, for example, from UniProt.Org.
  2. Note the length of your sequence.
  3. At pdb.org, go to Advanced Search.
  4. Click on “Choose a query type” and select Sequence under “Sequence Features”.

ReferencesReferences

Proteopedia Page Contributors and Editors (what is this?)Proteopedia Page Contributors and Editors (what is this?)

Eric Martz, Juergen Haas, Jaime Prilusky