Jmol/PDB file editing with Jmol: Difference between revisions

From Proteopedia
Jump to navigation Jump to search
Eric Martz (talk | contribs)
No edit summary
Angel Herraez (talk | contribs)
change to more proper WikiCode formatting and adding extra info about scripts inside pdb files
Line 13: Line 13:
First, run the [[Jmol/Application|Jmol.jar Java application]], and load any PDB file. If you have a PDB file saved to your disk (for example, downloaded from [http://rcsb.org RCSB.Org]), drag it and drop into the Jmol graphics window. Alternatively, to load 1d66 directly from the [[PDB]]
First, run the [[Jmol/Application|Jmol.jar Java application]], and load any PDB file. If you have a PDB file saved to your disk (for example, downloaded from [http://rcsb.org RCSB.Org]), drag it and drop into the Jmol graphics window. Alternatively, to load 1d66 directly from the [[PDB]]


:<tt># Anything between '#' and the end of a line (or a semicolon) is a comment.
: ''Note:'' Anything between '#' and the end of a line (or a semicolon) is a comment.
:<tt>load =1d66 # No space between = and the 4-character [[PDB code]].</tt>
load =1d66   # No space between = and the 4-character [[PDB code]].


After you have the PDB file loaded and the molecule is displayed:
After you have the PDB file loaded and the molecule is displayed:


:<tt>mypdb = getproperty("filecontents")</tt>
mypdb = getproperty("filecontents")


mypdb now has one long string (with newlines) for the entire PDB file, unmodified, including the header.
<code>mypdb</code> is a JmolScript variable that now has one long string (including newline characters) for the entire PDB file, unmodified, including the header.


==Edit line by line==
==Edit line by line==
It is usually easiest to loop through the PDB file line by line. So:
It is usually easiest to loop through the PDB file line by line. So:


:<tt>mypdblines = mypdb.lines</tt>
mypdblines = mypdb.lines


Now, <tt>mypdblines</tt> is an array with one PDB line per element. So you can loop line by line:
Now, <code>mypdblines</code> is an array with one PDB line per element. So you can loop line by line:
<tt>
for (i=1; i<=mypdblines.length; i++)
:for (i=1; i<=mypdblines.length; i++)
{
:{
  &nbsp;  mypdblines[i] ...
&nbsp;  mypdblines[i] ...
}
:}</tt>


Jmol has plentiful commands for finding lines, and editing them. For example, to operate only on lines beginning "ATOM ...",
Jmol has plentiful commands for finding lines, and editing them. For example, to operate only on lines beginning "ATOM ...",


:<tt>if (mypdblines[i].find("^ATOM ", "")) ...</tt>
if (mypdblines[i].find("^ATOM ", "")) ...


The second parameter "" signals that the first parameter should be interpreted as a [http://www.regular-expressions.info/ regular expression], where "^" means "beginning of the line".
The second parameter "" signals that the first parameter should be interpreted as a [http://www.regular-expressions.info/ regular expression], where "^" means "beginning of the line".
Line 44: Line 43:
Since PDB format has fixed column positions, you can, for example, change the chain name, which is in column 22:
Since PDB format has fixed column positions, you can, for example, change the chain name, which is in column 22:


:<tt>if ((mypdblines[i])[22][23] == "G") {(mypdblines[i])[22][23] = "D";}</tt>
if ((mypdblines[i])[22][23] == "G") {(mypdblines[i])[22][23] = "D";}


(The atom property "chain" is not writable in Jmol, nor are "resno" nor "seqcode". So you can't simply assign new values to these properties.)
(The atom property "chain" is not writable in Jmol, nor are "resno" nor "seqcode". So you can't simply assign new values to these properties.)
Line 52: Line 51:
When finished, you write the PDB file like this:
When finished, you write the PDB file like this:


:<tt>write var @mypdblines "filename.pdb"</tt>
write var @mypdblines "filename.pdb"


"Var" means you are writing the contents of a variable into a disk file.
"Var" means you are writing the contents of a variable into a disk file.
Line 58: Line 57:
It is important to note that the command
It is important to note that the command


:<tt>write "filename.pdb"</tt>
write "filename.pdb"


writes a file without the original header (and containing only the selected atoms). By using the variable <tt>mypdblines</tt>, you preserve the header and write all atoms.
writes a file without the original header (and containing only the selected atoms). By using the variable <tt>mypdblines</tt>, you preserve the header and write all atoms.
Line 66: Line 65:
Custom information can be inserted into the header section of <tt>mypdblines</tt>. For example, if Jmol has calculated things that you would like to have available (without re-calculating) in the output PDB file, you can insert lines between the first and second lines of mypdblines like this:
Custom information can be inserted into the header section of <tt>mypdblines</tt>. For example, if Jmol has calculated things that you would like to have available (without re-calculating) in the output PDB file, you can insert lines between the first and second lines of mypdblines like this:


:<tt>HEADER    TRANSCRIPTION/DNA                      06-MAR-92  1D66             
HEADER    TRANSCRIPTION/DNA                      06-MAR-92  1D66             
:@ Custom information in lines beginning "@ ".
@ Custom information in lines beginning "@ ".
:TITLE    DNA RECOGNITION BY GAL4: STRUCTURE OF A PROTEIN/DNA COMPLEX</tt>
TITLE    DNA RECOGNITION BY GAL4: STRUCTURE OF A PROTEIN/DNA COMPLEX


Jmol uses the first line of a  PDB file to recognize PDB format, so it is important not to put your custom lines first. Not only Jmol, but [[PyMOL]] and [[Chimera]] and likely other popular [[Molecular modeling and visualization software|molecular visualization apps]] (including [[FirstGlance in Jmol|FirstGlance]]) happily ignore lines in a PDB file that do not begin with a recognizable record name such as REMARK or ATOM. (FirstGlance recognizes lines beginning "!" as custom information from the ConSurf server.)
Jmol uses the first line of a  PDB file to recognize PDB format, so it is important not to put your custom lines first. Not only Jmol, but [[PyMOL]] and [[Chimera]] and likely other popular [[Molecular modeling and visualization software|molecular visualization apps]] (including [[FirstGlance in Jmol|FirstGlance]]) happily ignore lines in a PDB file that do not begin with a recognizable record name such as REMARK or ATOM. (FirstGlance recognizes lines beginning "!" as custom information from the ConSurf server.)


It is even possible to put Jmol scripts (perhaps to define a function, or specify custom variable values [variables are not saved in PDB nor in PNGJ files]) for later use. For example, this could be inserted into the header of mypdblines:
It is even possible to put Jmol scripts (perhaps to define a function, or specify custom variable values [variables are not saved in PDB nor in PNGJ files]) for later use. For example, this could be inserted into the header of <code>mypdblines</code>:


:<tt>@ # Jmol script.
@ # Jmol script.
:@ myvar = 12.6
@ myvar = 12.6
:@ function f1()
@ function f1()
:@ {
@ {
:@ &nbsp; print _arguments
@   print _arguments
:@ }
@ }
:@ # End Jmol script.</tt>
@ # End Jmol script.


After loading the saved PDB or PNGJ file with this in its header, you can drag and drop in a script file that (i) extracts the @ lines into a variable, (ii) removes the leading "@ " from each line, then (iii) executes the variable with "script inline @variable".
After loading the saved PDB or PNGJ file with this in its header, you can drag and drop in a script file that (i) extracts the @ lines into a variable, (ii) removes the leading "@ " from each line, then (iii) executes the variable with "script inline @variable".
If you prefer your files to be closer to PDB format standards (and so prevent potential problems if those files are read into other software), any extra custom lines should start with the PDB keyword <code>REMARK</code>. In fact, Jmol is prepared to read and apply any Jmol scripts embedded in the file, when a line starts with <code>REMARK jmolscript:</code> (as described in [http://wiki.jmol.org/index.php/File_formats/Scripting#Script_inline_within_a_molecular_coordinates_file this page]). However, there can only be one such line in a file and you must put you whole script of commands into that single line. Taking the example above, this would look like:
REMARK jmolscript: myvar = 12.6; function f1() { print _arguments }
(with spaces inside the script being optional)


==Writing a PNGJ file containing the edited lines==
==Writing a PNGJ file containing the edited lines==
Line 89: Line 94:
In addition to a PDB file, you can save a PNGJ file with customized PDB lines and a customized header, tho it is slightly more tricky. Unlike a PDB file, you can't save a PNGJ file from a variable. So here is one scripting method that works, using only 3 commands:
In addition to a PDB file, you can save a PNGJ file with customized PDB lines and a customized header, tho it is slightly more tricky. Unlike a PDB file, you can't save a PNGJ file from a variable. So here is one scripting method that works, using only 3 commands:


:<tt># Customize mypdblines as desired previous to this line.
# Customize mypdblines as desired previous to this line.
:zap # Deletes all atoms and defined atom sets. Preserves variables and functions.
zap # Deletes all atoms and defined atom sets. Preserves variables and functions.
:load var mypdblines # loads the PDB file in the variable, including header, optionally modified.
load var mypdblines # loads the PDB file in the variable, including header, optionally modified.
:<nowiki>#</nowiki> Render, color, center, orient and zoom as desired.
<nowiki>#</nowiki> Render, color, center, orient and zoom as desired.
:write filename.pngj</tt>
write filename.pngj


The PNGJ file will have all the customized PDB lines as well as the view at the time it was saved. It does not include variables or functions that were defined at the time is was saved. If any of these are needed, define them in custom header @ lines, and write a script to use them as described above.
The PNGJ file will have all the customized PDB lines as well as the view at the time it was saved. It does not include variables or functions that were defined at the time is was saved. If any of these are needed, define them in custom header @ lines, and write a script to use them as described above.

Revision as of 12:40, 12 September 2020

The Jmol.jar application can be used to edit the contents of PDB files. For example, you could change atom serial numbers, names of chains, change sequence numbers, and so forth. A command script file can be written to make specific changes, following the principles outlined below, using a plain text editor.

Identify yourself and changes madeIdentify yourself and changes made

What changes were made, by whom?What changes were made, by whom?

Before you make any edited PDB file public, such as by uploading it to Proteopedia, PLEASE insert REMARK lines that give your name and professional affiliation, and summarize what changes you made. Inserting REMARK lines can most easily be done as a last step, using a plain text editor.

Use a distinctive PDB file nameUse a distinctive PDB file name

Do not give your PDB file a name that is easily confused with the version published at the PDB, such as 1D66.pdb. Use a name that makes it clear that it has been modified, such as 1D66_chains_renamed.pdb.

Put the PDB file in a variablePut the PDB file in a variable

First, run the Jmol.jar Java application, and load any PDB file. If you have a PDB file saved to your disk (for example, downloaded from RCSB.Org), drag it and drop into the Jmol graphics window. Alternatively, to load 1d66 directly from the PDB

Note: Anything between '#' and the end of a line (or a semicolon) is a comment.
load =1d66   # No space between = and the 4-character PDB code.

After you have the PDB file loaded and the molecule is displayed:

mypdb = getproperty("filecontents")

mypdb is a JmolScript variable that now has one long string (including newline characters) for the entire PDB file, unmodified, including the header.

Edit line by lineEdit line by line

It is usually easiest to loop through the PDB file line by line. So:

mypdblines = mypdb.lines

Now, mypdblines is an array with one PDB line per element. So you can loop line by line:

for (i=1; i<=mypdblines.length; i++)
{
     mypdblines[i] ...
}

Jmol has plentiful commands for finding lines, and editing them. For example, to operate only on lines beginning "ATOM ...",

if (mypdblines[i].find("^ATOM ", "")) ...

The second parameter "" signals that the first parameter should be interpreted as a regular expression, where "^" means "beginning of the line".

Most of Jmol's built in functions for operations on character strings are listed in this section of the Jmol documentation.

Since PDB format has fixed column positions, you can, for example, change the chain name, which is in column 22:

if ((mypdblines[i])[22][23] == "G") {(mypdblines[i])[22][23] = "D";}

(The atom property "chain" is not writable in Jmol, nor are "resno" nor "seqcode". So you can't simply assign new values to these properties.)

Write a PDB file containing the edited linesWrite a PDB file containing the edited lines

When finished, you write the PDB file like this:

write var @mypdblines "filename.pdb"

"Var" means you are writing the contents of a variable into a disk file.

It is important to note that the command

write "filename.pdb"

writes a file without the original header (and containing only the selected atoms). By using the variable mypdblines, you preserve the header and write all atoms.

Saving key information in the headerSaving key information in the header

Custom information can be inserted into the header section of mypdblines. For example, if Jmol has calculated things that you would like to have available (without re-calculating) in the output PDB file, you can insert lines between the first and second lines of mypdblines like this:

HEADER    TRANSCRIPTION/DNA                       06-MAR-92   1D66             
@ Custom information in lines beginning "@ ".
TITLE     DNA RECOGNITION BY GAL4: STRUCTURE OF A PROTEIN/DNA COMPLEX

Jmol uses the first line of a PDB file to recognize PDB format, so it is important not to put your custom lines first. Not only Jmol, but PyMOL and Chimera and likely other popular molecular visualization apps (including FirstGlance) happily ignore lines in a PDB file that do not begin with a recognizable record name such as REMARK or ATOM. (FirstGlance recognizes lines beginning "!" as custom information from the ConSurf server.)

It is even possible to put Jmol scripts (perhaps to define a function, or specify custom variable values [variables are not saved in PDB nor in PNGJ files]) for later use. For example, this could be inserted into the header of mypdblines:

@ # Jmol script.
@ myvar = 12.6
@ function f1()
@ {
@   print _arguments
@ }
@ # End Jmol script.

After loading the saved PDB or PNGJ file with this in its header, you can drag and drop in a script file that (i) extracts the @ lines into a variable, (ii) removes the leading "@ " from each line, then (iii) executes the variable with "script inline @variable".

If you prefer your files to be closer to PDB format standards (and so prevent potential problems if those files are read into other software), any extra custom lines should start with the PDB keyword REMARK. In fact, Jmol is prepared to read and apply any Jmol scripts embedded in the file, when a line starts with REMARK jmolscript: (as described in this page). However, there can only be one such line in a file and you must put you whole script of commands into that single line. Taking the example above, this would look like:

REMARK jmolscript: myvar = 12.6; function f1() { print _arguments }

(with spaces inside the script being optional)

Writing a PNGJ file containing the edited linesWriting a PNGJ file containing the edited lines

PNGJ files contain a PNG (Portable Network Graphics) static image of the scene Jmol was displaying when the PNGJ file was written, and also the complete information to reproduce the scene in Jmol. When you drag a PNGJ file and drop it into Jmol's graphics window, the scene appears in interactive form that can be rotated, zoomed, and further modified with Jmol commands.

In addition to a PDB file, you can save a PNGJ file with customized PDB lines and a customized header, tho it is slightly more tricky. Unlike a PDB file, you can't save a PNGJ file from a variable. So here is one scripting method that works, using only 3 commands:

# Customize mypdblines as desired previous to this line.
zap # Deletes all atoms and defined atom sets. Preserves variables and functions.
load var mypdblines # loads the PDB file in the variable, including header, optionally modified.
# Render, color, center, orient and zoom as desired.
write filename.pngj

The PNGJ file will have all the customized PDB lines as well as the view at the time it was saved. It does not include variables or functions that were defined at the time is was saved. If any of these are needed, define them in custom header @ lines, and write a script to use them as described above.

A PDB file editing server?A PDB file editing server?

For those who might be interested in writing a server to modify PDB files, JmolData.jar is a variant of Jmol that runs without a graphics window. It is perfect for these kinds of operations. User:Jaime Prilusky used it in Proteopedia.org to generate a series of image files after small rotations. These are then assembled into a multi-GIF movie using other free software (ImageMagick.org). See the link "Export Animated Image" under any JSmol in Proteopedia.Org. In collaboration with Prilusky, User:Eric Martz adapted these server routines to make such animations within FirstGlance.Jmol.Org (with a simplified user interface). There, under JSmol, click "Save Image or Animation for Powerpoint".

If you know of any PDB file editing servers, please link them here!

Proteopedia Page Contributors and Editors (what is this?)Proteopedia Page Contributors and Editors (what is this?)

Eric Martz, Angel Herraez, Jaime Prilusky