Renumbering PDB files: Difference between revisions

← Older edit

Proteopedia Page Contributors and Editors (what is this?)Proteopedia Page Contributors and Editors (what is this?)

Eric Martz, Wayne Decatur

@@ Line 1: / Line 1: @@
-Chemical groups (residues) in [[atomic coordinate files]] ([[PDB files]]) are numbered. For polymers (protein, DNA, RNA), the amino acid and nucleotide groups are given ''sequence'' numbers. For non-polymer groups ([[hetero atoms|hetero]] groups in PDB terminology), the numbers are arbitrary. The [[wwPDB]] allows arbitrary numbering of polymer sequences. See examples at [[Unusual sequence numbering]]. Discrepancies in numbering are confusing and frustrating when comparing structures of similar macromolecules.
+Chemical groups (residues) in [[atomic coordinate files]] ([[PDB files]]) are numbered. For polymers (protein, DNA, RNA), the amino acid and nucleotide groups are given ''sequence'' numbers. For non-polymer groups ([[hetero atoms|hetero]] groups in PDB terminology), the numbers are arbitrary, but ideally do not overlap with the polymer sequence numbers. The [[wwPDB]] allows arbitrary numbering of polymer sequences. See examples at [[Unusual sequence numbering]]. Discrepancies in numbering are confusing and frustrating when comparing structures of similar macromolecules.
 One of many examples is comparison of the structures of a bacterial cytochrome, OmcS. [[6ef8]] and [[6nef]] are [[cryo-EM]] structures of the same cytochrome, mature length 407 amino acids (after removal of the N-terminal signal peptide, length 25 amino acids). 6ef8 is numbered 1-407, while the same residues in 6nef are numbered 26-432.
@@ Line 6: / Line 6: @@
 ==PDBrenum==
-[http://dunbrack3.fccc.edu/PDBrenum/ PDBrenum] is a server that renumbers atomic coordinate files to match the numberings in the corresponding [http://uniprot.org UniProt] entries. PDBrenum will process both [[PDB file format]] and [[atomic coordinate file|mmCIF file format]] atomic coordinate files.
+[http://dunbrack3.fccc.edu/PDBrenum/ PDBrenum] is a server that renumbers entries in the Protein Data Bank to match the numberings in the corresponding [http://uniprot.org UniProt] entries. PDBrenum will process both [[PDB file format]] and [[atomic coordinate file|mmCIF file format]] atomic coordinate files. PDBrenum does NOT process arbitrary models, such as AlphaFold predictions. For these, see below.
-In the example of [[6ef8]] vs. [[6nef]], after processing by PDBrenum, the cytochromes in both files have sequence numbers 26-432, which is very helpful. Unfortunately, the hemes (HEC) are listed in different orders in the text of the PDB files, so their numbers still don't match.
+In the example of [[6ef8]] vs. [[6nef]], after processing by PDBrenum, the cytochromes in both files have sequence numbers 26-432, which is very helpful. Unfortunately, the authors listed the hemes (HEC) in different orders in the text of the PDB files, so their numbers still don't match.
+There is a scientific article describing PDBrenum<ref>PMID: 34228733</ref> that shows it can be run as a Python script as well. A demonstration of running it in a scripted manner can be worked through by pressing the 'launch' badge [https://github.com/fomightez/PDBrenum here] to get an active Jupyter notebook powered via the MyBinder.org system.
 ==PDB Tools Web==
@@ Line 16: / Line 18: @@
 *pdb_gap: "'''Detects gaps''' between consecutive residues in the sequence, both by a distance criterion or discontinuous residue numbering. Only applies for protein residues."
 *pdb_delinsertion: "'''Deletes insertion codes''' in a PDB file, shifting the residue numbering of downstream residues. Allows for picking specific residues too." (For examples with insertion codes, see [[Unusual sequence numbering]].)
+===Example===
+[[6ef8]], a cytochrome polymer, has 7 chains of protein with hemes (HEC). Each protein chain is numbered 1-407. To re-number the protein 26-432 (as UniProt does), we need to add 25 to each number. Here is one set of steps:
+#Specify 6ef8, and press the Fetch button.
+#At the "Main" menu, select pdb_selchain. Press the + button to add this operation to the pipeline.
+#Type '''A''' in the chain ID slot.
+#At the "Main" menu, select pdb_shiftres. Press the + button to add this operation to the pipeline.
+#Type '''25''' in the shift slot.
+#At the bottom, check '''Tidy'''.
+#At the bottom, click the green '''Run''' button.
+The output is a PDB file containing only chain A, renumbered 26-432. The HEC groups are also renumbered. To avoid renumbering those, you would have to delete them and then cut/paste from the original PDB file using a [[Help:Plain text editors|plain text editor]].
+==References==
+<references />