Renumbering PDB files: Difference between revisions

Eric Martz (talk | contribs)
Eric Martz (talk | contribs)
 
(8 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Chemical groups (residues) in [[atomic coordinate files]] ([[PDB files]]) are numbered. For polymers (protein, DNA, RNA), the amino acid and nucleotide groups are given ''sequence'' numbers. For non-polymer groups ([[hetero atoms|hetero]] groups in PDB terminology), the numbers are arbitrary. The [[wwPDB]] allows arbitrary numbering of polymer sequences. See examples at [[Unusual sequence numbering]]. Discrepancies in numbering are confusing and frustrating when comparing structures of similar macromolecules.
Chemical groups (residues) in [[atomic coordinate files]] ([[PDB files]]) are numbered. For polymers (protein, DNA, RNA), the amino acid and nucleotide groups are given ''sequence'' numbers. For non-polymer groups ([[hetero atoms|hetero]] groups in PDB terminology), the numbers are arbitrary, but ideally do not overlap with the polymer sequence numbers. The [[wwPDB]] allows arbitrary numbering of polymer sequences. See examples at [[Unusual sequence numbering]]. Discrepancies in numbering are confusing and frustrating when comparing structures of similar macromolecules.


One of many examples is comparison of the structures of a bacterial cytochrome, OmcS. [[6ef8]] and [[6nef]] are [[cryo-EM]] structures of the same cytochrome, mature length 407 amino acids (after removal of the N-terminal signal peptide, length 25 amino acids). 6ef8 is numbered 1-407, while the same residues in 6nef are numbered 26-432.
One of many examples is comparison of the structures of a bacterial cytochrome, OmcS. [[6ef8]] and [[6nef]] are [[cryo-EM]] structures of the same cytochrome, mature length 407 amino acids (after removal of the N-terminal signal peptide, length 25 amino acids). 6ef8 is numbered 1-407, while the same residues in 6nef are numbered 26-432.
Line 6: Line 6:


==PDBrenum==
==PDBrenum==
[http://dunbrack3.fccc.edu/PDBrenum/ PDBrenum] is a server that renumbers atomic coordinate files to match the numberings in the corresponding [http://uniprot.org UniProt] entries. PDBrenum will process both [[PDB file format]] and [[atomic coordinate file|mmCIF file format]] atomic coordinate files.
[http://dunbrack3.fccc.edu/PDBrenum/ PDBrenum] is a server that renumbers entries in the Protein Data Bank to match the numberings in the corresponding [http://uniprot.org UniProt] entries. PDBrenum will process both [[PDB file format]] and [[atomic coordinate file|mmCIF file format]] atomic coordinate files. PDBrenum does NOT process arbitrary models, such as AlphaFold predictions. For these, see below.


In the example of [[6ef8]] vs. [[6nef]], after processing by PDBrenum, the cytochromes in both files have sequence numbers 26-432, which is very helpful. Unfortunately, the hemes (HEC) are listed in different orders in the text of the PDB files, so their numbers still don't match.
In the example of [[6ef8]] vs. [[6nef]], after processing by PDBrenum, the cytochromes in both files have sequence numbers 26-432, which is very helpful. Unfortunately, the authors listed the hemes (HEC) in different orders in the text of the PDB files, so their numbers still don't match.
 
There is a scientific article describing PDBrenum<ref>PMID: 34228733</ref> that shows it can be run as a Python script as well. A demonstration of running it in a scripted manner can be worked through by pressing the 'launch' badge [https://github.com/fomightez/PDBrenum here] to get an active Jupyter notebook powered via the MyBinder.org system.


==PDB Tools Web==
==PDB Tools Web==
Line 16: Line 18:
*pdb_gap: "'''Detects gaps''' between consecutive residues in the sequence, both by a distance criterion or discontinuous residue numbering. Only applies for protein residues."
*pdb_gap: "'''Detects gaps''' between consecutive residues in the sequence, both by a distance criterion or discontinuous residue numbering. Only applies for protein residues."
*pdb_delinsertion: "'''Deletes insertion codes''' in a PDB file, shifting the residue numbering of downstream residues. Allows for picking specific residues too." (For examples with insertion codes, see [[Unusual sequence numbering]].)
*pdb_delinsertion: "'''Deletes insertion codes''' in a PDB file, shifting the residue numbering of downstream residues. Allows for picking specific residues too." (For examples with insertion codes, see [[Unusual sequence numbering]].)
===Example===
[[6ef8]], a cytochrome polymer, has 7 chains of protein with hemes (HEC). Each protein chain is numbered 1-407. To re-number the protein 26-432 (as UniProt does), we need to add 25 to each number. Here is one set of steps:
#Specify 6ef8, and press the Fetch button.
#At the "Main" menu, select pdb_selchain. Press the + button to add this operation to the pipeline.
#Type '''A''' in the chain ID slot.
#At the "Main" menu, select pdb_shiftres. Press the + button to add this operation to the pipeline.
#Type '''25''' in the shift slot.
#At the bottom, check '''Tidy'''.
#At the bottom, click the green '''Run''' button.
The output is a PDB file containing only chain A, renumbered 26-432. The HEC groups are also renumbered. To avoid renumbering those, you would have to delete them and then cut/paste from the original PDB file using a [[Help:Plain text editors|plain text editor]].
==References==
<references />

Proteopedia Page Contributors and Editors (what is this?)Proteopedia Page Contributors and Editors (what is this?)

Eric Martz, Wayne Decatur