User:Eric Martz/Sandbox 6: Difference between revisions

Eric Martz (talk | contribs)
Eric Martz (talk | contribs)
x
 
(264 intermediate revisions by the same user not shown)
Line 1: Line 1:
<applet load='Dnac_from_2ggz_a.pdb' size='500' frame='true' align='right'
<applet size='450' frame='true' align='right' caption='Insert caption here'
scene='User:Eric_Martz/Sandbox_4/Dnac_model_from_2ggz_a/8' />
scene='User:Eric_Martz/Sandbox_6/2hu4_3cl0_aligned/4' />
==Templates for Homology Modeling==


{| class="wikitable" style="text-align:center"
H274Y ([[Amino Acids#Histidine|Histidine]] to [[Amino Acids#Tyrosine|Tyrosine]]) is the most common mutation conferring resistance to the drug Tamiflu in the neuraminidase N1 of Influenza A (e.g. H1N1, H5N1). Although resistant, this mutant N1 still binds Tamiflu weakly, and a crystal structure was obtained, [[3ckz]]. Below, this is compared to the wild type, [[2hu4]]], using structural alignment of a single protein chain:Tamiflu complex from each.
|+ Templates for Homology Modeling of E. coli DnaC (245 amino acids)
!    Name !! PDB Code (Resolution) !! Released !! Length (amino acids) !! Template alignment length: range (%) !! Target alignment length: range (%) !! Aligned Sequence Identity !! Expectation
|-
| DnaC helicase loader ''Aquifex aeolicus'' ||  3ec2 (2.7 &Aring;) ||  Nov 25 2008 ||  183 ||  174: 6-179 (95%) || 163: 68-230 (67%) || 23% || 0.0006
|}


==Homology Model of DnaC==
'''Why is the affinity of Tamiflu reduced by the mutation H274Y?''' Use the scenes below to help answer this question.
The following sequence was provided for DnaC from E. coli:


<tt>
Tamiflu is shown as thick sticks. Atoms contacting Tamiflu (4 &Aring;) are shown as balls. Amino acids that contain contacting atoms, plus amino acid 274, are shown as thin sticks. All other amino acids are hidden.
MKNVGDLMQR LQKMMPAHIK PAFKTGEELL AWQKEQGAIR SAALERENRA
<br>
MKMQ<b>RTFNRS GIRPLHQNCS FENYRVECEG QMNALSKARQ YVEEFDGNIA
<br>
SFIFSGKPGT GKNHLAAAIC NELLLRGKSV LIITVADIMS AMKDTFRNSG
<br>
TSEEQLLNDL SNVDLLVIDE IGVQTESKYE KVIINQIVDR RSSSKRPTGM
<br>
LTNSNMEEMT KLLGERVMDR MRLGNSLWVI FNWDSYR</b>SRV TGKEY
</tt>


This sequence (245 amino acids) was submitted to Swiss Model, which [http://tinyurl.com/4nek2q generated the homology model] shown here (<scene name='User:Eric_Martz/Sandbox_4/Dnac_model_from_2ggz_a/8'>restore initial scene</scene>) using [[2qgz]] chain A as a template, which has 18.6% sequence identity. Apparently Swiss Model used predicted secondary structure to help in the sequence alignment, but details are not clear to me. The homology model represents residues 55-237 (183 residues representing 75% of dnaC), shown in boldface in the above sequence. ''Because of the low sequence identity, this model may well contain major errors, or even be wholly incorrect.''
Influenza neuraminidase N1:


Swiss Model has apparently used the [[temperature value]] field in the PDB file to indicate regions that are highly unreliable, namely the regions that are <font color="red"><b>red</b></font> when the model is <scene name='User:Eric_Martz/Sandbox_4/Dnac_model_from_2ggz_a/4'>colored by temperature</scene>. These regions are shown as '''translucent white''' in the initial scene (using the Jmol command <i>select temperature >50</i>). The uncertainty in three of these regions is explained by gaps in the template model (see below). Although the details of these regions are even more uncertain than other regions, it seems likely that these loops are on the surface, if the homology model turns out to be substantially correct.
<jmol>
<jmolButton>
<script>anim off; frame 1.2</script>
<text>H274Y</text>
</jmolButton>
</jmol>
<jmol>
<jmolButton>
<script>anim off; frame 1.1</script>
<text>Wild Type</text>
</jmolButton>
</jmol>
<jmol>
<jmolButton>
<script>anim off; frame 0</script>
<text>Both</text>
</jmolButton>
</jmol>
<jmol>
<jmolButton>
<script>if (~animation); if (_animating); anim pause; else; anim mode loop 0 0;anim fps 2; anim resume; endif; endif;</script>
<text>Toggle Animation</text>
</jmolButton>
</jmol>


===Evolutionary Conservation in the Homology Model===
Atoms are colored by element:
 
{{Template:ColorKey_Element_C}},  
The [[Conservation, Evolutionary|evolutionary conservation]] pattern, [http://consurf.tau.ac.il/results/1222995227/output.html revealed by ConSurf], is quite interesting, showing <scene name='User:Eric_Martz/Sandbox_4/Dnac_model_from_2ggz_a/9'>two conserved patches</scene>.<ref>ConSurf found only 10 sequences in SwissProt, with an Average Pairwise Distance of 1.6. The [http://consurf.tau.ac.il/results/1222995227/output.html run shown here] used 100 sequences from Uniprot, with an APD of 1.4.</ref>
{{Template:ColorKey_Element_N}},
 
{{Template:ColorKey_Element_O}} (in protein),
===Homology Model in FirstGlance===
<font color="magenta"><b>O</b></font> (in water).
 
In order to find specific residues, or see charge distribution or other aspects of this homology model, please
[http://oca.weizmann.ac.il/oca-docs/fgij/fg.htm?mol=http%3A//proteopedia.org/wiki/images/3/3e/Dnac_from_2ggz_a.pdb View The DnaC Homology Model in FirstGlance in Jmol]
 
===Structural Alignment with Closest Hit at PDB===
<applet load='2chg9-63_aligned_with_dnac_model.pdb' size='400' frame='true' align='right' caption='Residues 9-63 structurally aligned with the homology model of dnaC.' scene='User:Eric_Martz/Sandbox_4/2chg9-63_aligned_with_dnac_mod/1'/>
 
When the [[PDB]] is searched with the dnaC sequence, the closest hit is 39% identity with residues 9-63 of chain A of replication factor C ([[2chg|2CHG]]), which align with 72-124 of dnaC. When the above homology model of dnaC (made with template 2QGZ) is structurally aligned with residues 9-63 of 2CHG<ref>Structural alignment done with DeepView 3.6b3 using Magic Fit of carbon alphas.</ref>, 43 alpha carbons (out of 54) aligned with RMS deviation 2.3 &Aring;. <font color="#ff0000">'''Residues 21-63 of 2CHG'''</font> aligned with <font color="#3030ff">'''residues 80-124 of the dnaC homology model'''</font>. (Non-aligned portions are pastel.) This result adds some confidence to the homology model, since the structural alignment of 2CHG:A21-63 occurred in the same range as the sequence alignment (which was 72-124 in dnaC).
{{Clear}}
 
==Crystal Structure of DnaC Is "In The Pipeline"==
 
A sequence-based search at the international [http://targetdb.pdb.org/ Structural Genomics TargetDB] reveals that the closest completed structure is 2QGZ, the one chosen by SwissModel as a template. A number of crystal and NMR structures have sequence identities up to 37% but over shorter stretches, and with higher E values.
 
Diffraction data have been obtained (but the solved structure not yet deposited) for a ''Listeria monocytogenes'' sequence of 307 residues, pI 5.2, with an E value of 1.6e-05, though only 21% sequence identity. Diffraction-quality crystals (but not yet diffraction data) have not been obtained for any sequence with such a low E value.
 
''E. coli'' dnaC (245 residues, pI 9.4) has been crystallized by RIKEN Structural Genomics Initiative (Japan), but the crystals may not be of diffraction quality. It has been cloned, expressed as a soluble protein, and purified (but not yet crystallized) by 3 Structural Genomics Groups (RIKEN Structural Genomics Initiative (Japan), Montreal-Kingston Bacterial Structural Genomics Initiative, Midwest Center for Structural Genomics), as have several proteins with >40% sequence identity. So there is reason for optimism that either a crystal structure, or a more suitable template for homology modeling, will be forthcoming soon. '''One might consider contacting the groups who have reported purification of dnaC to inquire about progress, and possibly request priority for dnaC.'''
 
==Gaps in the Template Model==
<applet load='Dnac_from_2ggz_a.pdb' size='500' frame='true' align='right'
scene='User:Eric_Martz/Sandbox_4/2qgz/3' />
The template was 2QGZ (<scene name='User:Eric_Martz/Sandbox_4/2qgz/3'>initial scene</scene>). The portion of the template used was Glu107-Arg300. Only the amino-terminal 6 residues were not used as template (translucent). Note that there are <scene name='User:Eric_Martz/Sandbox_4/2qgz/5'>three loops</scene> in this segment of the template that lack coordinates due to [[disorder]] in the crystal (marked with spacefilled alpha-carbon atoms).
 
The missing loops are 202-205 (NGSV), 226-231 (EQATSW), and 268-275 (TIKGSDET). These gaps, which occur between the residues marked /\ below, were apparently ignored in making the model, which has a continuous main chain.
 
{{Clear}}
Below is the alignment produced by Swiss Model, used in making the 3D model. Vertical bars for identity were inserted by hand (I may have missed some).
<pre>
                                                |    | |  |    ||
TARGET    55            R TFNRSGIRPL HQNCSFENYR VECEGQMNAL SKARQYVEEF
2qgzA    100  qkqaais--e riqlvslpks yrhihlsdid vnnasrmeaf saildfveqy
                                                                     
TARGET                    sssss    h h            hhhhhhh hhhhhhhhh
2qgzA              hhh  h  sss    h h            hhhhhhh hhhhhhhhh
 
                            |        | ||  ||    | |              |
TARGET    96    DGN-IASFIF SGKPGTGKNH LAAAICNELL L-RGKSVLII TVADIMSAMK
2qgzA    148  psaeqkglyl ygdmgigksy llaamahels ekkgvsttll hfpsfaidvk
                                                                     
TARGET                ssss ss    hhh hhhhhhhhhh h h  ssss sshhhhhhh
2qgzA                ssss ss    hhh hhhhhhhhhh hh    ssss sshhhhhhh
 
                                  ||  |  | ||                |
TARGET    144  DTFRNSGTSE EQLLNDLSNV DLLVIDEIGV QTESKYEKVI INQIVDRRSS
2qgzA    198  naiske---- --eidavknv pvlilddiga vrde-----v lqvilqyrml
                  /\                          / \
TARGET                        hhh    ssssss              hhhhhhhhhh
2qgzA                        hh  h    ssssss              hhhhhhhhhh
 
                  |    |                ||| |  |              |
TARGET    194  SKRPTGMLTN SNMEEMTKLL ---GERVMDR MRLGNSLWVI FNWDSYR 
2qgzA    247  eelptfftsn ysfadlerkw awqakrvmer vr-ylarefh leganrr- 
                                      /\
TARGET          h  ssssss    hhhhh          hhhh hh  ssssss s       
2qgzA          h  ssssss    hhhh          hhhh hh hh ssss s
</pre>
 
Below is the sequence with ATOM records (coordinates) from 2QGZ, numbered 100-300, showing the gaps as "...". This sequence listing was used to locate the positions marked /\ above.
<pre>
    1 .......... .......... .......... .......... ..........
  51 .......... .......... .......... .......... .........Q
  101 KQAAISERIQ LVSLPKSYRH IHLSDIDVNN ASRMEAFSAI LDFVEQYPSA
  151 EQKGLYLYGD MGIGKSYLLA AMAHELSEKK GVSTTLLHFP SFAIDVKNAI
  201 S....KEEID AVKNVPVLIL DDIGA..... .VRDEVLQVI LQYRMLEELP
 
  251 TFFTSNYSFA DLERKWA... .....WQAKR VMERVRYLAR EFHLEGANRR
</pre>
(Copied from Protein Explorer's sequence display.)
 
Below is the alignment of full-length dnaC with 2QGZ according to TargetDB (see above). Note that the 2QGZ structure begins at residue 100, and so the homology model begins with residue 55 of dnaC, indicated with &gt; below.
<pre>
ID:  DR58  Center: NESGC
E-value: 0.00028  Identity: 19.737%
 
                                    10        20        30       
Query                        MKNVGDLMQRLQKMMPAHIKPAFKTGEELLAWQKEQGA
                                    Q+ Q  P++I  +++    +    + +
Subjct EVASFISQHHLSQEQINLSLSKFNQFLVERQKYQLKDPSYIAKGYQPILAMNEGYADVSY
              40        50        60        70        80        90
 
      40        50    >  60        70        80        90       
Query  IRSAALERENRAMKMQRTFNRSGIRPLHQNCSFENYRVECEGQMNALSKARQYVEEF-DG
      +++  L + ++  +++ ++  ++  +++  + +  V+  ++M+A+S  ++VE++ ++
Subjct LETKELVEAQKQAAISERIQLVSLPKSYRHIHLSDIDVNNASRMEAFSAILDFVEQYPSA
              100      110      120      130      140      150
 
      100      110      120        130      140      150     
Query  NIASFIFSGKPGTGKNHLAAAICNELLLR-GKSVLIITVADIMSAMKDTFRNSGTSEEQL
      +  ++ + G  G GK++L AA+ +EL  + G S+ ++  ++  +K+++ N++++EE 
Subjct EQKGLYLYGDMGIGKSYLLAAMAHELSEKKGVSTTLLHFPSFAIDVKNAISNGSVKEE--
              160      170      180      190      200         
 
        160      170        180      190      200      210   
Query  LNDLSNVDLLVIDEIGV-QTESKYEKVIINQIVDRRSSSKRPTGMLTNSNMEEMTK----
      ++ ++NV +L++D+IG+ Q+ S  +  +++ I++ R  + PT + +N ++ ++ +   
Subjct IDAVKNVPVLILDDIGAEQATSWVRDEVLQVILQYRMLEELPTFFTSNYSFADLERKWAT
      210      220      230      240      250      260       
 
                    220      230      240   
Query  LLG-------ERVMDRMRLGNSLWVIFNWDSYRSRVTGKEY
      + G      +RVM+R+R                     
Subjct IKGSDETWQAKRVMERVRYLAREFHLEGANRR       
      270      280      290      300       
</pre>
 
==Notes==
<references />