Renumbering PDB files: Difference between revisions
Eric Martz (talk | contribs) |
Eric Martz (talk | contribs) |
||
(8 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
Chemical groups (residues) in [[atomic coordinate files]] ([[PDB files]]) are numbered. For polymers (protein, DNA, RNA), the amino acid and nucleotide groups are given ''sequence'' numbers. For non-polymer groups ([[hetero atoms|hetero]] groups in PDB terminology), the numbers are arbitrary. The [[wwPDB]] allows arbitrary numbering of polymer sequences. See examples at [[Unusual sequence numbering]]. Discrepancies in numbering are confusing and frustrating when comparing structures of similar macromolecules. | Chemical groups (residues) in [[atomic coordinate files]] ([[PDB files]]) are numbered. For polymers (protein, DNA, RNA), the amino acid and nucleotide groups are given ''sequence'' numbers. For non-polymer groups ([[hetero atoms|hetero]] groups in PDB terminology), the numbers are arbitrary, but ideally do not overlap with the polymer sequence numbers. The [[wwPDB]] allows arbitrary numbering of polymer sequences. See examples at [[Unusual sequence numbering]]. Discrepancies in numbering are confusing and frustrating when comparing structures of similar macromolecules. | ||
One of many examples is comparison of the structures of a bacterial cytochrome, OmcS. [[6ef8]] and [[6nef]] are [[cryo-EM]] structures of the same cytochrome, mature length 407 amino acids (after removal of the N-terminal signal peptide, length 25 amino acids). 6ef8 is numbered 1-407, while the same residues in 6nef are numbered 26-432. | One of many examples is comparison of the structures of a bacterial cytochrome, OmcS. [[6ef8]] and [[6nef]] are [[cryo-EM]] structures of the same cytochrome, mature length 407 amino acids (after removal of the N-terminal signal peptide, length 25 amino acids). 6ef8 is numbered 1-407, while the same residues in 6nef are numbered 26-432. | ||
Line 6: | Line 6: | ||
==PDBrenum== | ==PDBrenum== | ||
[http://dunbrack3.fccc.edu/PDBrenum/ PDBrenum] is a server that renumbers | [http://dunbrack3.fccc.edu/PDBrenum/ PDBrenum] is a server that renumbers entries in the Protein Data Bank to match the numberings in the corresponding [http://uniprot.org UniProt] entries. PDBrenum will process both [[PDB file format]] and [[atomic coordinate file|mmCIF file format]] atomic coordinate files. PDBrenum does NOT process arbitrary models, such as AlphaFold predictions. For these, see below. | ||
In the example of [[6ef8]] vs. [[6nef]], after processing by PDBrenum, the cytochromes in both files have sequence numbers 26-432, which is very helpful. Unfortunately, the hemes (HEC) | In the example of [[6ef8]] vs. [[6nef]], after processing by PDBrenum, the cytochromes in both files have sequence numbers 26-432, which is very helpful. Unfortunately, the authors listed the hemes (HEC) in different orders in the text of the PDB files, so their numbers still don't match. | ||
There is a scientific article describing PDBrenum<ref>PMID: 34228733</ref> that shows it can be run as a Python script as well. A demonstration of running it in a scripted manner can be worked through by pressing the 'launch' badge [https://github.com/fomightez/PDBrenum here] to get an active Jupyter notebook powered via the MyBinder.org system. | |||
==PDB Tools Web== | ==PDB Tools Web== | ||
Line 16: | Line 18: | ||
*pdb_gap: "'''Detects gaps''' between consecutive residues in the sequence, both by a distance criterion or discontinuous residue numbering. Only applies for protein residues." | *pdb_gap: "'''Detects gaps''' between consecutive residues in the sequence, both by a distance criterion or discontinuous residue numbering. Only applies for protein residues." | ||
*pdb_delinsertion: "'''Deletes insertion codes''' in a PDB file, shifting the residue numbering of downstream residues. Allows for picking specific residues too." (For examples with insertion codes, see [[Unusual sequence numbering]].) | *pdb_delinsertion: "'''Deletes insertion codes''' in a PDB file, shifting the residue numbering of downstream residues. Allows for picking specific residues too." (For examples with insertion codes, see [[Unusual sequence numbering]].) | ||
===Example=== | |||
[[6ef8]], a cytochrome polymer, has 7 chains of protein with hemes (HEC). Each protein chain is numbered 1-407. To re-number the protein 26-432 (as UniProt does), we need to add 25 to each number. Here is one set of steps: | |||
#Specify 6ef8, and press the Fetch button. | |||
#At the "Main" menu, select pdb_selchain. Press the + button to add this operation to the pipeline. | |||
#Type '''A''' in the chain ID slot. | |||
#At the "Main" menu, select pdb_shiftres. Press the + button to add this operation to the pipeline. | |||
#Type '''25''' in the shift slot. | |||
#At the bottom, check '''Tidy'''. | |||
#At the bottom, click the green '''Run''' button. | |||
The output is a PDB file containing only chain A, renumbered 26-432. The HEC groups are also renumbered. To avoid renumbering those, you would have to delete them and then cut/paste from the original PDB file using a [[Help:Plain text editors|plain text editor]]. | |||
==References== | |||
<references /> |