How to predict structures with AlphaFold: Difference between revisions
Eric Martz (talk | contribs) |
Eric Martz (talk | contribs) |
||
(64 intermediate revisions by the same user not shown) | |||
Line 3: | Line 3: | ||
In July, 2021, DeepMind released [[Alphafold#AlphaFold_published_July_2021|AlphaFold as open source code]]. Subsequently, [[Alphafold#Free_AlphaFold-based_Servers|several Colabs became available]] offering '''free''' structure prediction for user-submitted protein sequences. These Google Colabs (collaboratories)<ref name="colabfaq">[https://research.google.com/colaboratory/faq.html Collaboratory FAQ] at Google.</ref>. enable users to submit sequences via web browser, executing the code in the Google cloud, using space private to each user, returning predicted structures. In 2024, DeepMind provided the [https://alphafoldserver.com AlphaFold3 server]<ref name="af3">PMID: 38718835</ref> (see below). | In July, 2021, DeepMind released [[Alphafold#AlphaFold_published_July_2021|AlphaFold as open source code]]. Subsequently, [[Alphafold#Free_AlphaFold-based_Servers|several Colabs became available]] offering '''free''' structure prediction for user-submitted protein sequences. These Google Colabs (collaboratories)<ref name="colabfaq">[https://research.google.com/colaboratory/faq.html Collaboratory FAQ] at Google.</ref>. enable users to submit sequences via web browser, executing the code in the Google cloud, using space private to each user, returning predicted structures. In 2024, DeepMind provided the [https://alphafoldserver.com AlphaFold3 server]<ref name="af3">PMID: 38718835</ref> (see below). | ||
Below are instructions for beginners who wish to predict structures | Below are instructions for beginners who wish to predict structures. | ||
==Is An Empirical Model Available?== | ==Is An Empirical Model Available?== | ||
Empirical models are the most accurate, so you should look for those first. See [[How To Find A Structure]]. If there is no empirical model for your amino acid sequence, it may be useful to explore [[How_To_Find_A_Structure#Sequence-Related_Empirical_Models|empirical models for closely-related sequences]], if available. | Empirical models are the most accurate, so you should look for those first. See [[How To Find A Structure]]. If there is no empirical model for your amino acid sequence, it may be useful to explore [[How_To_Find_A_Structure#Sequence-Related_Empirical_Models|empirical models for closely-related sequences]], if available. Even if an empirical structure is available, most have missing residues or atoms, and it may be useful to compare it with the AlphaFold prediction: see [[Missing residues and incomplete sidechains]]. | ||
==Does AlphaFold Database Already Have Your Protein?== | ==Does AlphaFold Database Already Have Your Protein?== | ||
Line 14: | Line 13: | ||
In 2024, AlphaFold Database predictions are always [https://alphafoldserver.com/faq single protein chain structures without ligands]. If your protein is an assembly of multiple chains, you will likely want to compare the Database structure with predictions from the latest servers capable of multiple-chain + ligand predictions (see below). | In 2024, AlphaFold Database predictions are always [https://alphafoldserver.com/faq single protein chain structures without ligands]. If your protein is an assembly of multiple chains, you will likely want to compare the Database structure with predictions from the latest servers capable of multiple-chain + ligand predictions (see below). | ||
==Prediction Servers== | |||
You can submit one (or a set) of sequences to these servers, and they will return predicted structures, along with estimates of confidence in their predictions. This is not a comprehensive list. Please add other servers of interest to a broad range of users, including beginners. | |||
* 2024<ref name="af3" />: [https://alphafoldserver.com AlphaFoldServer.Com]. Using AlphaFold3, predicts homo- and hetro-multimers involving protein, DNA, RNA, ligands, and modified residues. Straightforward to use; Guide and FAQ provided. Predictions are templated without user control (see FAQ). Free for non-commercial use -- see [https://alphafoldserver.com/terms Terms of Service] and [https://alphafoldserver.com/output-terms Output Terms of Use]. From the DeepMind team<ref name="af3" />. | |||
**If you get <font color="red"><b>Invalid character</b></font> after pasting in your sequence, try removing the breaks between lines. | |||
**Predicted models are in [[Mmcif format|mmCIF format]] only. To convert to [[PDB format]] for use in [[FirstGlance in Jmol|FirstGlance]] (which colors by confidence/pLDDT automatically), see [[Converting AlphaFold3 CIF to PDB]]. | |||
**To easily obtain average [[pLDDT]] (predicted confidence) for a range of residues, see [[FirstGlance/How to get average pLDDT from AlphaFold models]]. | |||
**See also [[#Visualizing Predicted Structures]] and [[User:Eric Martz/AlphaFold3 case studies|AlphaFold3 case studies]]. | |||
* 2024<ref name="rfaa">PMID: 38452047</ref>: RosettaFold All-Atom (RFAA) predicts multimers of protein and nucleic acids with ligands. From the Baker team<ref name="rfaa" />. [https://neurosnap.ai/service/RoseTTAFold%20All-Atom A free server limited to very small numbers of jobs is available from Neurosnap]. | |||
* 2024<ref name="combfold">PMID: 38326495</ref>: [https://neurosnap.ai/service/CombFold CombFold] predicts the structures of large protein complexes from subunit sequences using AlphaFold Multimer paired with a cominatorial method to assemble subunits. From Shor and Schneidman-Duhovny<ref name="combfold" />. | |||
* 2022<ref name="colabfold">PMID: 35637307</ref>: [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/beta/AlphaFold2_advanced.ipynb ColabFold AlphaFold2_advanced]. Predicts homo- and hetero-multimers using methods from the Steinegger/Mirdita team<ref name="colabfold" /><ref name="colabfold2024">PMID: 39402428</ref>, before AlphaFold-multimer<ref name="afmultimer">[https://www.biorxiv.org/content/10.1101/2021.10.04.463034v1.full Protein complex prediction with AlphaFold-Multimer], Preprint, Evans et al. 2021.</ref> was available. Does NOT use templates. [https://www.ebi.ac.uk/training/online/courses/alphafold/accessing-and-predicting-protein-structures-with-alphafold/predicting-protein-structures-with-colabfold-and-alphafold-colab/ See Instructions from EMBL-EBI]. | |||
==Prediction | *2022<ref name="af2021">PMID: 34265844</ref><ref name="afmultimer" />: [https://colab.research.google.com/github/deepmind/alphafold/blob/main/notebooks/AlphaFold.ipynb AlphaFold2/Multimer Colab] able to predict protein multimers. From the DeepMind team<ref name="af2021" /><ref name="afmultimer" />. [https://www.ebi.ac.uk/training/online/courses/alphafold/accessing-and-predicting-protein-structures-with-alphafold/predicting-protein-structures-with-colabfold-and-alphafold-colab/ See Instructions from EMBL-EBI]. | ||
*2022<ref name="alphafill">PMID: 36424442</ref>: [https://alphafill.eu/ AlphaFill] “transplants” missing ligands, cofactors and (metal) ions into AlphaFold models. From the Perrakis team<ref name="alphafill" />. Ligand positioning is approximate. See [[Alphafold#Ligands:_AlphaFill |CAUTION]] provided by the AlphaFill team: | |||
<blockquote> | |||
"AlphaFill models are not meant or suitable for precise quantification of interactions between the transferred ligand(s) and the protein (e.g. hydrogen bonds, π-π or cation-π interactions, van der Waals interactions, hydrophobic interactions, halogen bonds)." | |||
</blockquote> | |||
*2021<ref name="rosettafold">PMID: 34282049</ref>: [https://robetta.bakerlab.org/ RoseTTAFold at Robetta] is an independent design from the Baker team<ref name="rosettafold" />, influenced by the design of AlphaFold2. Predicts monomers and multimers. Comparing results of RoseTTAFold with results of AlphaFold2/3 is worthwhile. At [https://robetta.bakerlab.org/ Robetta], open the Structure Prediction menu at the top, and choose Submit. ''Be sure to check RoseTTAFold under Optional!'' | |||
===Cost?=== | |||
The above servers are free for limited, non-commercial use. | |||
'''Colabs''': After multiple free jobs in a Colab, a new job may be refused. You may be informed that a GPU could not be assigned. In 2024, a subscription to [https://colab.research.google.com/signup/pricing Colab Pro] is US $10/month. Paying this will enable you to do many more jobs. | |||
==Visualizing Predicted Structures== | |||
[http://firstglance.jmol.org FirstGlance in Jmol] automatically colors its initial view of uploaded AlphaFold or RoseTTAFold models by estimated confidence [[pLDDT]] ('''{{Font color|blue|blue for high confidence}}, {{Font color|red|red for low confidence}}'''). After you go to other views or tools, you can always get back to this color scheme by clicking ''Reliability Estimates'' in the ''Views'' tab. | |||
*[[iCn3D]] automatically colors AlphaFold2 Database models loaded from their UniProt IDs. For AlphaFold files opened from your computer, use pLDDT on the pull-down Color menu. | |||
*[[PyMOL]] and [[ChimeraX]] have no built-in confidence/pLDDT color scheme. Their rainbow/spectrum color schemes for temperature/B-factor color confidence/pLDDT with the AlphaFold color scheme inverted. | |||
[http://firstglance.jmol.org/where.htm#uploading Upload] your predicted PDB file to [http://firstglance.jmol.org FirstGlance.Jmol.Org], which has many [http://firstglance.jmol.org/whatis.htm#unique unique conveniences and capabilities]. | |||
You can easily visualize | |||
* Estimated confidence/pLDDT by touching an atom | |||
* '''Average confidence/pLDDT''' ("reliability") for the entire model, or for [[FirstGlance/How to get average pLDDT from AlphaFold models|a specified sequence range]]. | |||
* Secondary structure (Views tab) | |||
* Distribution of hydrophobic vs. polar residues (Views tab: integral membrane proteins will have large hydrophobic surfaces while soluble proteins will have hydrophobic cores revealed by the ''Slab'' button) | |||
* Distribution of charges (Views tab: nucleic acid binding sites will have clusters of positive charges) | |||
* Disulfide bonds (Tools tab) | |||
* Domain structure and positions of the ends of the polypeptide chain (Views tab: N -> C Rainbow) | |||
* Locations of functional sites by evolutionary conservation (see instructions at [[How_to_see_conserved_regions]]) | |||
==Instructions for ColabFold 2022== | |||
This procedure was written in 2022<ref name="">Some of the recommended options were gleaned from the [https://www.youtube.com/watch?v=Rfw7thgGTwI 1 hour 46 minute video of presentations by Sergey Ovchinnikov and Martin Steinegger] (August, 2021) for the Boston Protein Design and Modeling Club | |||
hosted by Chris Bahl. | |||
</ref>. In 2024, ColabFold is not necessarily the best or only place to submit you job: see [[#Prediction Servers]]. | |||
Initially, AlphaFold and ColabFold performed best with '''single chains'''<ref name="afmultimer">[https://www.biorxiv.org/content/10.1101/2021.10.04.463034v1 Protein complex prediction with AlphaFold-Multimer], 2021, Evans ''et al.'' (DeepMind Team).</ref>, which may include one or a few domains. The instructions below were written '''before ColabFold was adapted to prediction of multimers'''. If you are interested in complexes or alternate conformations, please see ColabFold instructions in the 2023 paper by Kim ''et al.'' <ref name="kim2023">[https://protocolexchange.researchsquare.com/article/pex-2490/v1 Easy and accurate protein structure prediction using ColabFold], 2023, Kim ''et al.'' (DeepMind Team).</ref> | |||
Initially, AlphaFold and ColabFold performed best with single chains<ref name=" | |||
If you are interested in complexes or alternate conformations, please see ColabFold instructions in the 2023 paper by Kim ''et al.'' <ref name="kim2023">[https://protocolexchange.researchsquare.com/article/pex-2490/v1 Easy and accurate protein structure prediction using ColabFold], 2023, Kim ''et al.'' (DeepMind Team).</ref> | |||
==Submitting A Sequence== | ===Submitting A Sequence=== | ||
First, check the [https://alphafold.ebi.ac.uk/ AlphaFold Database] for the protein of interest. If its structure has already been predicted there, download it, and skip to [[#Interpreting Results|Interpreting Results]] below. Otherwise ... | First, if your query is a single chain molecule, check the [https://alphafold.ebi.ac.uk/ AlphaFold Database] for the protein of interest. If its structure has already been predicted there, download it, and skip to [[#Interpreting Results|Interpreting Results]] below. Otherwise ... | ||
Don't worry about any of the options not specifically mentioned below. Leave them at their default settings. | Don't worry about any of the options not specifically mentioned below. Leave them at their default settings. | ||
Line 54: | Line 100: | ||
Don't worry about the "Warning". It is just Google's disclaimer that they did not write the code you are about to execute. Click ''Run anyway''. | Don't worry about the "Warning". It is just Google's disclaimer that they did not write the code you are about to execute. Click ''Run anyway''. | ||
==Downloading Results== | ===Downloading Results=== | ||
{{Font color|red|Do NOT close your AlphaFold2_advanced browser tab until the job is completed.}} It appears that you will lose your job if you close the browser tab. You will be warned if you inadvertently try. | {{Font color|red|Do NOT close your AlphaFold2_advanced browser tab until the job is completed.}} It appears that you will lose your job if you close the browser tab. You will be warned if you inadvertently try. | ||
Line 60: | Line 106: | ||
When the job is completed, a dialog to download a zip file will appear automatically. (Sometimes you will be asked for permission to enable download first.) | When the job is completed, a dialog to download a zip file will appear automatically. (Sometimes you will be asked for permission to enable download first.) | ||
==Interpreting Results== | ===Interpreting Results=== | ||
Static images of backbone renderings of predicted models will appear in your web browser at the bottom of the section ''run alphafold'' as each is completed. | Static images of backbone renderings of predicted models will appear in your web browser at the bottom of the section ''run alphafold'' as each is completed. | ||
===Estimated Reliability=== | ====Estimated Reliability==== | ||
Each predicted model has an average estimated reliability ([[AlphaFold pLDDT and expected distance error|pLDDT]], predicted local distance difference test). >90 is likely accurate; <70 is low confidence. For more about interpreting these values, please see the [https://alphafold.ebi.ac.uk/faq AlphaFold Database FAQ]. | Each predicted model has an average estimated reliability ([[AlphaFold pLDDT and expected distance error|pLDDT]], predicted local distance difference test). >90 is likely accurate; <70 is low confidence. For more about interpreting these values, please see the [https://alphafold.ebi.ac.uk/faq AlphaFold Database FAQ]. | ||
Line 70: | Line 116: | ||
Each residue has an estimated reliability of its position (0-100) in the PDB [[temperature]] column. BEWARE that high values mean high confidence, and low values mean low confidence. This is the INVERSE of [[temperature|crystallographic temperature values]], where low values are good and high values are bad. Uploading your PDB file to [http://firstglance.jmol.org FirstGlance in Jmol] will automatically color each residue by its estimated reliability. | Each residue has an estimated reliability of its position (0-100) in the PDB [[temperature]] column. BEWARE that high values mean high confidence, and low values mean low confidence. This is the INVERSE of [[temperature|crystallographic temperature values]], where low values are good and high values are bad. Uploading your PDB file to [http://firstglance.jmol.org FirstGlance in Jmol] will automatically color each residue by its estimated reliability. | ||
= | ====Intrinsic Disorder==== | ||
===Intrinsic Disorder=== | |||
Some models have high confidence in a folded [[domain]], and low confidence in a segment that is not part of a compact domain. Low-confidence segments may be [[Intrinsically_Disordered_Protein|intrinsically disordered]]. It is useful to compare [[Intrinsically_Disordered_Protein#Protein_disorder_predictors|predictions of disorder]] with AlphaFold reliability estimates. | Some models have high confidence in a folded [[domain]], and low confidence in a segment that is not part of a compact domain. Low-confidence segments may be [[Intrinsically_Disordered_Protein|intrinsically disordered]]. It is useful to compare [[Intrinsically_Disordered_Protein#Protein_disorder_predictors|predictions of disorder]] with AlphaFold reliability estimates. | ||
===Relative Positions of Domains=== | ====Relative Positions of Domains==== | ||
If the predicted model has more than one [[domain]], each domain may have high confidence, yet the relative positions of the domains may not. The estimated reliability of relative domain positions is in graphs of '''predicted aligned error''' (PAE) which are included in the downloadable zip file of results. For an explanation, see ''How should I interpret the relative positions of domains?'' in the [https://alphafold.ebi.ac.uk/faq AlphaFold Database FAQ]. | If the predicted model has more than one [[domain]], each domain may have high confidence, yet the relative positions of the domains may not. The estimated reliability of relative domain positions is in graphs of '''predicted aligned error''' (PAE) which are included in the downloadable zip file of results. For an explanation, see ''How should I interpret the relative positions of domains?'' in the [https://alphafold.ebi.ac.uk/faq AlphaFold Database FAQ]. | ||
===Recycles For Convergence=== | ====Recycles For Convergence==== | ||
You may be interested to note the number of recycles required for each model to converge to the specified tolerance. These numbers are not captured in the downloaded zip file. | You may be interested to note the number of recycles required for each model to converge to the specified tolerance. These numbers are not captured in the downloaded zip file. | ||
Line 108: | Line 141: | ||
Also notice that, in this case, all 3 models have low confidence (pLDDT < 70), and are of questionable value. | Also notice that, in this case, all 3 models have low confidence (pLDDT < 70), and are of questionable value. | ||
==See Also== | ==See Also== | ||
*[[AlphaFold/Index]], a list of pages in Proteopedia about Alphafold. | *[[AlphaFold/Index]], a list of pages in Proteopedia about Alphafold. | ||
*[[User:Eric Martz/AlphaFold3 case studies|AlphaFold3 case studies]] includes a case that AlphaFold3 cannot predict. | |||
*[[How To Find A Structure]] | |||
*[[Missing residues and incomplete sidechains]] | |||
==References and Notes== | ==References and Notes== | ||
<references /> | <references /> |