Atomic coordinate file: Difference between revisions
Eric Martz (talk | contribs) |
Eric Martz (talk | contribs) |
||
(12 intermediate revisions by the same user not shown) | |||
Line 30: | Line 30: | ||
====Retirement of PDB Format==== | ====Retirement of PDB Format==== | ||
In February, 2019, the | In February, 2019, the [[wwPDB]] announced that new depositions must be in the mmCIF format beginning July 1, 2019<ref name="endOfPDBFormat">[https://lists.sdsc.edu/pipermail/pdb-l/2019-February/006209.html Mandatory PDBx/mmCIF format files submission for MX depositions]: posted on the PDB email list Feb 20, 2019 by Jasmine Young, Biocuration Team Lead, RCSB PDB. The wwPDb website also posted [http://www.wwpdb.org/news/news?year=2019#5c6ad3c5ea7d0653b99c8766 this document].</ref>. The PDB sometimes refers to the mmCIF format as "PDBx", which should not be confused with the original legacy PDB format. | ||
In December, 2023, the [[wwPDB]] announced that all 3-character ligand ID codes had been exhausted <ref name="nomore3s">[https://www.wwpdb.org/news/news?year=2023#656f4404d78e004e766a96c6 PDB Entries with Novel Ligands Now Distributed Only in PDBx/mmCIF and PDBML File Formats], wwPDB News, December 12, 2023.</ref>. Thereafter, new entries with novel ligands will be available only in mmCIF format, since the legacy PDB format cannot accommodate the new 5-character ligand IDs. Examples that use 5-character ligand IDs: [[8rox]] has [https://www.rcsb.org/ligand/A1H17 A1H17]; [[8vkz]] has [https://www.rcsb.org/ligand/A1ACE A1ACE]. | |||
In 2024, the [[wwPDB]] estimates that all 4-character [[PDB ID code]]s will be consumed by 2029<ref name="spring2024">[https://cdn.rcsb.org/rcsb-pdb/general_information/news_publications/newsletters/2024q2/deposit.html#two Resources for Supporting the Extended PDB ID Format (pdb_00001abc)], Spring 2024 Issue of the RCSB PDB Newsletter.</ref>. Thereafter, new entries will be available only in mmCIF format using [[PDB_identification_code#Future_Plans_for_Expanded_PDB_Codes|12-character ID codes]]. | |||
===mmCIF Data Format=== | ===mmCIF Data Format=== | ||
In response to the inadequacies of the PDB data format, the International Union of Crystallographers and the | In response to the inadequacies of the PDB data format, the International Union of Crystallographers and the | ||
[[Protein Data Bank | World Wide Protein Data Bank]] have adopted the ''macromolecular crystallographic information format'' (mmCIF) as their primary data format for macromolecules. mmCIF is also sometimes referred to as PDBx (not to be confused with the PDB format). While the mmCIF/PDBx format has considerable merit from the perspective of computer scientists, it is unpopular with crystallographers, who prefer to work in the PDB data format. Therefore, the PDB has maintained the entire database in both formats. However, new depositions must be in the mmCIF format beginning July 1, 2019, and it is anticipated that the PDB format will be phased out, of necessity, around | [[Protein Data Bank | World Wide Protein Data Bank]] have adopted the ''macromolecular crystallographic information format'' (mmCIF) as their primary data format for macromolecules. mmCIF is also sometimes referred to as PDBx (not to be confused with the PDB format). While the mmCIF/PDBx format has considerable merit from the perspective of computer scientists, it is unpopular with crystallographers, who prefer to work in the PDB data format. Therefore, the PDB has maintained the entire database in both formats. However, new depositions must be in the mmCIF format beginning July 1, 2019, and it is anticipated that the PDB format will be phased out, of necessity, around 2026<ref name="endOfPDBFormat" /><ref>PMID: 30988261</ref>. | ||
*[http://mmcif.wwpdb.org/ World Wide Protein Data Bank's website on mmCIF] | *[http://mmcif.wwpdb.org/ World Wide Protein Data Bank's website on mmCIF] | ||
Models with >99,999 atoms, or >62 chains, do not fit in the PDB format (see [[Jmol/Visualizing large molecules]]). Such models are available only in mmCIF format, and not in the PDB format. However, in | ====Models Available Only in mmCIF Format==== | ||
In April, 2024, 2.3% of the entries in the [[wwPDB]] are available only in mmCIF format. | |||
Models with >99,999 atoms, or >62 chains, do not fit in the PDB format (see [[Jmol/Visualizing large molecules]]). Such models are available only in mmCIF format, and not in the PDB format. However, in 2024, such models are available in subsets in PDB format. For example, at [https://www.rcsb.org/structure/5LEG 5LEG], look for "PDB format-like files" in the ''Download Files'' menu. | |||
Models containing ligands with 5-character ID codes (see above) also do not fit in PDB format, and are | |||
available only in mmCIF format. | |||
===ASN.1 Data Format=== | ===ASN.1 Data Format=== |