How to predict structures with AlphaFold
In 2020, the AlphaFold project of Google's DeepMind team demonstrated a major breakthrough in predicting protein structure from sequence. Their success in the blind CASP competition astonished many experts. For an overview, see Theoretical models.
In July, 2021, DeepMind released AlphaFold as open source code. Subsequently, several Colabs became available offering free structure prediction for user-submitted protein sequences. These Google Colabs (collaboratories)[1]. enable users to submit sequences via web browser, executing the code in the Google cloud, using space private to each user, returning predicted structures.
Below are instructions for beginners who wish to predict structures. We recommend the "advanced" Colab by Sergey Ovchinnikov, Milot Mirdita and Martin Steinegger.
Submitting A SequenceSubmitting A Sequence
Don't worry about any of the options not specifically mentioned below. Leave them at their default settings.
1. Obtain the sequence of the protein of interest, e.g. at UniProt. Click on the FASTA button above the sequence in UniProt. Copy only the sequence, excluding the FASTA header line that begins with ">".
2. Login with a google account at AlphaFold2_advanced. You can register for a free gmail account to use for login.
3. Paste in your sequence, making sure to completely replace the default sequence:
This input slot can accept sequences >1,000 amino acids, even though it is only one line. Sequence lengths of ~1,000 amino acids, or longer, may cause the Colab to fail, but can be predicted by submitting in two halves.[2]
4. Enter a jobname in the slot below the sequence slot. The results.zip filename will begin with this jobname (but none of its contents include the jobname).
5. Scroll down to the subsection titled run alphafold, Sampling options:
- num_models, the number of models to be predicted, is 5 by default. You could reduce this to 3 if you are in a hurry.
- max_recycles: Set this to 48 (or at least 12). The actual number of "recycles" performed will stop when the model has converged to the specified tolerance. The default of 3 is often not enough for an optimal result.
- tol (tolerance): Set this to 0.5 Å (or 1.0 to get a faster result). When a prediction differs from the previous "recycle" prediction by less than this value (RMSD in Å between alpha carbons), the recycles will stop.
- num_samples (random seeds): Leave this at 1. Beware that if you increase this above 1, you will generate a number of models equal to the product of this value and num_models. This will proportionally increase the time to complete a result.
6. Open the Runtime menu at the very top of the page, and select Run all.
ResultsResults
Static images of backbone renderings of predicted models will appear at the bottom of the section run alphafold as they are completed.
You may be interested to note the number of recycles required for each model to converge to the specified tolerance. These numbers are not captured in the downloaded zip file.
The models will be ranked with number one having the highest estimated reliability (pLDDT). This is usually not in the order in which they were calculated. You might want to copy the ranking list, perhaps adding the number of recycles and final tolerance values:
model rank based on pLDDT Recycles Tolerance rank_1_model_2_ptm_seed_0 pLDDT:62.46 10 0.33 rank_2_model_3_ptm_seed_0 pLDDT:59.59 9 0.47 rank_3_model_1_ptm_seed_0 pLDDT:55.63 12 0.52
Notice that the model predicted 2nd had the best estimated reliability (pLDDT), and that the model ranked 3rd did not quite achieve the specified tolerance of 0.5 Å RMSD after 12 recycles. (12 was specified as the maximum in this job.)
Downloading ResultsDownloading Results
Do NOT close your AlphaFold2_advanced browser tab until the job is completed. It appears that you will lose your job if you close the browser tab. You will be warned if you inadvertently try.
When the job is completed, a dialog to download a zip file will appear automatically. (Sometimes you will be asked for permission to enable download first.)
Paid MembershipPaid Membership
Colab Pro
References and NotesReferences and Notes
- ↑ Collaboratory FAQ at Google.
- ↑ I had one sequence of length ~1,300. After it failed, I submitted it as two halves with a substantial overlap (~350 residues). The middle overlap of ~200 residues of the predicted structures superposed very closely with DeepView. I trimmed off the ends that superposed poorly, and superposed the two halves via the mid-overlap. By inspection, I chose pair of alpha carbons near the middle where the alpha carbon positions were nearly identical. I trimmed each half to this position, and "ligated" the two halves by combining the superposed half PDB files with a text editor. For further details, contact User:Eric_Martz.