Seqtool: Difference between revisions
documenting the SequenceTool |
documenting the SequenceTool |
||
Line 1: | Line 1: | ||
'''SeqTool''' displays the sequence of the structure in the JSmol applet. | '''SeqTool''' displays the sequence of the structure in the JSmol applet. | ||
To have SeqTool in your page, add <code><b><seqtool/></b></code> in the wikitext, including the <b>/</b> after the word "seqtool". | To have SeqTool in your page, add <code><b><seqtool/></b></code> in the wikitext, including the <b>/</b> after the word "seqtool". It is recommended to put that outside and before the <code><StructureSection></code> tag, so that the full width of screen is available for the sequence display and its search box. | ||
Currently, this will only work correctly with pdb-format files. | Currently, this will only work correctly with pdb-format files. | ||
Line 9: | Line 9: | ||
=== Description of sequence display === | === Description of sequence display === | ||
Whenever a model is loaded into JSmol, a sequence listing is displayed for each chain. | Whenever a model is loaded into JSmol, a sequence listing is displayed for each chain. For saving space, only one chain is displayed at a time; such chain may be chosen from a drop-down menu. If the model has a single chain, it will be displayed by default. | ||
The sequence is | The sequence is presented using the one-letter code for each amino acid residue. Ligands (hetero groups) and non-standard residues are displayed as "x" and positioned according to their residue number in the pdb file. Water and solvent groups are omitted by default. | ||
Both ribonucleotides and deoxyribonucleotides in nucleic acids are converted to single-letter. | Both ribonucleotides and deoxyribonucleotides in nucleic acids are converted to single-letter, too. | ||
The sequence includes a combination of: | The sequence includes a combination of: | ||
Line 60: | Line 60: | ||
The purpose of the sequence listing in FG is to make it easy to relate sequence to 3D structure within the FG application. Therefore the listing should be kept very simple. Sequence annotations such as secondary structure or sequence motifs can be viewed in other databases, and need not be shown in the FG sequence listing. | The purpose of the sequence listing in FG is to make it easy to relate sequence to 3D structure within the FG application. Therefore the listing should be kept very simple. Sequence annotations such as secondary structure or sequence motifs can be viewed in other databases, and need not be shown in the FG sequence listing. | ||
1. The sequence listing will be displayed in the lower left division of FG. (no problem) | |||
2. Residues will be listed in one-letter code, with x for non-standard residues. (done; I've decided to omit water & solvent, and leave the other hetero groups in the sequence) | |||
Optionally, the residue could automatically be brought to the front (don't know how to do this) of the molecule (by rotating the molecule) , slid smoothly to the center, and zoomed. This is done, upon Shift/Ctrl/Alt + click; also highlights like simple click. | 3. Touching the one-letter code for an amino acid will display, in a form slot, its chain name, ATOM sequence number (and insertion code when present) and three-letter code. (OK) insertion code is lowercase, it has better visibility in my opinion; aa use the standard Uppercase-lowercase-lowercase format; nucleotides are converted to single-letter even if deoxy (PDBv3 uses DG etc.) | ||
4. Seq->3D: Clicking the one-letter code for a residue will highlight it in the 3D view in Jmol. (done) | |||
* Optionally, the residue could automatically be brought to the front (don't know how to do this) of the molecule (by rotating the molecule) , slid smoothly to the center, and zoomed. This is done, upon Shift/Ctrl/Alt + click; also highlights like simple click. (partially done; how?) | |||
5. 3D->Seq: Clicking a residue in the 3D view will highlight it in the sequence listing. (done; each residue in the sequence has a unique ID that allows this. | |||
The following are optional and need not be in the initial release. | The following are optional and need not be in the initial release. | ||
6. Entering a sequence fragment will highlight the locations of any matches in the sequence listing | |||
and also in the model. May fail if gaps or microheterogeneity are involved. (done) | |||
7. Entering a residue name (e.g. PRO or CYS or A or U) will highlight the locations of that residue in the sequence listing. (done) and also in the model. But we must use one-letter code (or implement a different slot) | |||
8. Coloring the sequence listing automatically, according to the color scheme in the 3D view. This would be appropriate for all views: (--how / when to trigger?) | |||
* Secondary structure | |||
* Cartoon, Vines (color by chain) | |||
* N->C Rainbow | |||
* Composition | |||
* Hydrophobic/Polar | |||
* Charge | |||
* Contacts (contacting residues highlighted) | |||
B. Contents & Format of the Sequence Listing | |||
1. There will be a single sequence list taken from the ATOM records (with residues that lack coordinates taken from SEQRES). The SEQRES residues will not be listed separately, but discrepancies between aligned SEQRES and ATOM records will be indicated in the single list as detailed below. (done) | |||
* See example below for 1QKZ:L. | |||
2. Non-standard residues will be listed as x (lowercase). Touching the x's will display their 1-3 letter ATOM record abbreviation codes in a form slot. (done) | |||
* 2SOC (NMR) has DPN, DTR, THO. Its listing would be xCFxKTCx. | |||
* 1BKX:A has TPO197 and SEP338. | |||
* 1AL4 has D amino acids DLE, DVA, also ETA etc. | |||
* 1EVV has many non-standard nucleotides. | |||
3. Sequence numbers will be those in the ATOM records. Thus, some listings will start with a negative, zero or 2 or higher sequence number. (done) | |||
* 2FSR starts at -9, skips -2 through 3, and resumes at 4. | |||
* 1D5T:A starts at -2 and skips 0. | |||
* 1AVQ:A starts at -1 and includes 0. | |||
* 1BXW:A starts at 0. | |||
* 1UCY:K starts at 16; | |||
* 163D:A starts at 43; | |||
Additionally, some listings will have numbers in decreasing order, or large discontinuities in numbering. | Additionally, some listings will have numbers in decreasing order, or large discontinuities in numbering. | ||
* 1NSA contains a single (unnamed) protein chain with sequence 7A-95A that continues 4-308. | |||
* 1IAO:B contains (in this order) 1S, 323P-334P, 6-94, 94A, 95-188, 1T, 2T. | |||
4. Inserted residues will be listed in-line with other residues, but distinguished by having their one-letter codes displayed as superscript letters. (done) | |||
* See example below for 1QKZ:L. | |||
* 1UCY:L starts with 8 inserted residues in reverse alphabetic order, and has 13 inserted residues near the end. Other chains have many insertions as well. | |||
5. When there is a numbering gap (due to numbering according to a reference sequence) but no residues are missing in the 3D structure (SEQRES and ATOM records match), the position of the gap will be indicated by two hyphens surrounding a number indicating the size of the gap, e.g. -1-, -2-, -3-, -23-, and so forth. (done, but using ~ instead) | |||
* See example below for 1QKZ:L. | |||
* 1IGT:B has 23 such gaps, e.g following 97, 130, 154, 157, etc. | |||
* 163D:A (RNA) has a gap following 58. | |||
6. When there is a physical gap in the 3D model (residues in SEQRES that are absent in ATOM records, typically due to crystallographic disorder), the residues with no coordinates will be listed in lower case. Touching such a residue will report an interpolated sequence number. Clicking on such a residue will produce a message* explaining that the residue lacks coordinates. (mostly done; still need to implement interpolation (?); right now, it increments the nr. by 1 | |||
(*) tooltip (onMouseOver) and alert box (onClick) | (*) tooltip (onMouseOver) and alert box (onClick) | ||
* See example below for 1QKZ:L. | |||
* 2ACE has leading, embedded, and trailing physical gaps. | |||
7. When there is sequence microheterogeneity (residues in ATOM records that are absent in SEQRES), the alternate residues at the same sequence position will be enclosed in square brackets. (done) | |||
* For example, in 1CBN, residues starting at number 20 would be represented ...GT[PS]EA.... At position 22, PRO and SER are alternate residues. | |||
* 1AL4 and 1ETA have sequence microheterogeneity. | |||
* More on sequence microheterogeneity. Sequence microheterogeneity appears to be quite rare in the PDB, but I have not found definitive website-based search strategy. | |||
The following are optional and need not be in the initial release. | The following are optional and need not be in the initial release. | ||
8. A checkbox to highlight residues with missing atoms. For example, some crystallographic results have the alpha and beta carbons of certain amino acids, but lack the remainder of the sidechains. (--how to check? --sure a checkbox?) Maybe just count the atoms in the residue? Gly must be an exception. (residue and not hydrogen).atomCount <=4 (N,C,O,CA) -- will miss cases. A table for all 20 aa? Too hard. | |||
* (@@ examples needed) | |||
9. A checkbox to highlight residues with alternate sidechain conformations (rotamers; multiple sets of coordinates for sidechain atoms). Alternate sidechain conformations are quite common in the PDB. (--sure a checkbox? done (w/ link) | |||
* 5HVP:A, 1AL4 have alternate sidechain conformations at 12,45,60,65... and ... | |||