Getting Unremediated PDB Files: Difference between revisions

From Proteopedia
Jump to navigation Jump to search
Wayne Decatur (talk | contribs)
Wayne Decatur (talk | contribs)
Line 20: Line 20:


Use the simple interface [http://www.umass.edu/microbio/chime/pe_beta/pe/protexpl/unremed.htm here at Eric Martz's UMASS site] to easily get July 31, 2007 unremediated pdb files [ftp://snapshots.wwpdb.org/ via ftp at the RCSB Protein Data Bank] in the directory 20070731. This is primarily for obtaining DNA before the residues were re-named DC, DG, DT, DA, for Protein Explorer/Chime.<br>
Use the simple interface [http://www.umass.edu/microbio/chime/pe_beta/pe/protexpl/unremed.htm here at Eric Martz's UMASS site] to easily get July 31, 2007 unremediated pdb files [ftp://snapshots.wwpdb.org/ via ftp at the RCSB Protein Data Bank] in the directory 20070731. This is primarily for obtaining DNA before the residues were re-named DC, DG, DT, DA, for Protein Explorer/Chime.<br>
The March 16, 2009 unremediated versions are also available [ftp://snapshots.wwpdb.org/ via ftp at the RCSB Protein Data Bank] in the directory 20090316.<b>
The March 16, 2009 unremediated versions are also available [ftp://snapshots.wwpdb.org/ via ftp at the RCSB Protein Data Bank] in the directory 20090316.<br>
These files come back in a compressed form. The 2007 files are .Z compressed, while the 2009 files are .gz compressed. Thus to get the two forms of 1d66:<br>
These files come back in a compressed form. The 2007 files are .Z compressed, while the 2009 files are .gz compressed. Thus to get the two forms of 1d66:<br>



Revision as of 00:15, 10 April 2009

Two rounds of remediation have now taken place between 2007 and 2009 to better standardize and enhance the PDB files deposited before December 2, 2008. The details of the two rounds can be found at the Worldwide Protein Data Bank's Documentation page. The most recent version was released March 17th, 2009 as described on the news page at the Protein Data Bank.


The unremediated PDB archive from before March 17, 2008 is available, as detailed here because a time-stamped snapshot of the PDB archive before the March 17th release exists here in the directory 20090316.

The unremediated PDB archive from before August 1, 2007 is available, as detailed here.

If a PDB file was released after December 2, 2008, it is not available in unremediated form.


Why would you need an unremediated version of a pdb file?Why would you need an unremediated version of a pdb file?

A significant change made in the course of the first round (2007) of remediation was the distinction between ribonucleotides (A, C, G, I, T, U) and deoxyribonucleotides (DA, DC, DG, DI, DT, DU). The main reason for getting unremediated PDB files from before the 2007 remediation is that when the remediated PDB files contain DNA, CHIME-based Protein Explorer (and perhaps some other software) does not display the DNA properly. If the PDB file does not contain DNA (protein, RNA, solvent and ligands are OK), you probably don't need the unremediated file. If a PDB file was released after August 1, 2007, it will not be available in unremediated form that suits CHIME-based (and perhaps other) software. The second round of remediation (2008 round; released March 17th 2000) also mainly affected nucleic acid residues and atoms.
Proteopedia also needs the unremediated files (pre-March 17th 2009). Proteopedia actually premiered between the two rounds of remediation and relied on the atom serial numbers for saving the scenes, yet contacts the PDB to get the current PDB file. Thus when the March 17th remediated version of the database was released with differing atom serial numbers, scenes involving nucleic acid using the newer files often no longer looked correct because some atom serial numbers now do not match. A global fix for this is currently being worked on according to Eran Hodis and Jaime Prilusky (see the gray banner at the top of the page for updates).

How to get the unremediated version?How to get the unremediated version?

>OK, both the pre-remediation snapshots are working. The trick is >that the 2007 files are .Z compressed, while the 2009 files are .gz >compressed. (True, this is stated in the ftp help files!) Thus to get 1lbg:


Use the simple interface here at Eric Martz's UMASS site to easily get July 31, 2007 unremediated pdb files via ftp at the RCSB Protein Data Bank in the directory 20070731. This is primarily for obtaining DNA before the residues were re-named DC, DG, DT, DA, for Protein Explorer/Chime.
The March 16, 2009 unremediated versions are also available via ftp at the RCSB Protein Data Bank in the directory 20090316.
These files come back in a compressed form. The 2007 files are .Z compressed, while the 2009 files are .gz compressed. Thus to get the two forms of 1d66:

For prior to 2007 remediation:<no wiki>ftp://snapshots.wwpdb.org/20070731/pub/pdb/data/structures/all/pdb/pdb1d66.ent.Z</no wiki> For prior to 2009 remediation:<no wiki>ftp://snapshots.wwpdb.org/20090316/pub/pdb/data/structures/all/pdb/pdb1d66.ent.gz</no wiki>
For uncrompressing the 2007 files that are in .Z format: Eric Martz's site lists Winzip as an alternative for PC but this program is not free once a trial period expires; Stuffit expander failed to uncompress such a file on a PC although Eric lists it as useful on Macs. I found to uncompress them, I could upload them to the Web hosting server I have access to and use 'zcat -d [FILE NAME]' to have it show an uncompressed form in my Secure Shell client using logging enabled to save a file of the output locally on my own drive. Others have mentioned using a DOS version of uncompress on a PC, called uncomp.exe.


See also Standard Residues and Non-Standard Residues

Proteopedia Page Contributors and Editors (what is this?)Proteopedia Page Contributors and Editors (what is this?)

Wayne Decatur, Angel Herraez, Eric Martz