AlphaFold2

The software stack that runs the AlphaFold2 pipeline as described in Jumper et al. 2021.

What does it do?

AlphaFold2 uses multi-sequence alignment information in combination with a pre-trained neural network to predict the structure of proteins (not nucleic acids, small molecules, or post-translational modifications). On COSMIC2, the AlphaFold2 tool can run on individual proteins or multi-subunit protein complexes ("multimer").

Learn more:

Primary citations:
- Jumper et. al. 2021 "Highly accurate protein structure prediction with AlphaFold."
- Evans et al. 2022 "Protein complex prediction with AlphaFold-Multimer."
How to interpret AlphaFold structures (EMBL)
Software repository

How is it different than other tools?

AlphaFold2 is the landmark structure prediction software released online in 2021. AlphaFold2 is distinct from other software, such as ColabFold and AlphaFold3. AlphaFold2 relies on 'traditional' multiple sequence alignment tools to query sequence databases for similar sequences. These sequence alignments are then combined with a pretrained neural work to predict structures and then to recycle the output back through the process multiple times.

AlphaFold2 can be run on individual proteins ("monomers") or on multi-subunit protein complexes ("multimer").

Running this tool on COSMIC2

Note: the AlphaFold2 tool run be run on monomers or multimers

AlphaFold2 - single proteins ("monomer")

Tool name: AlphaFold2
Input: FASTA file with a single chain.
Database: full_dbs is the full database to use during the sequence alignment step.
Model: monomer_ptm runs on monomeric proteins and outputs pLDDT and PAE scores.

> test job

PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASK

Example test job FASTA input: (download example as FASTA file here)

Watch how to create a plain text file, upload, & run on COSMIC2 here.

Output

First, check the ranking of predicted models in ranking_debug.json to see which model has the highest score (as ranked by pLDDT). You can assess the quality of prediction by looking at the pLDDT file and predicted aligned error (if you used the option pTM). Here is the output for the prediction with AlphaFold2. The pLDDT score reflects one-dimensional confidence (i.e., confidence in secondary assignment) and the PAE reflects 3D confidence. What you can see is that the C-terminus of this predicted structure has low confidence for pLDDT in addition to low confidence for the predicted aligned error (PAE). The PAE plot tells you that AlphaFold2 has low confidence for the 3D position of amino acids 58 & 59 relative to the rest of the molecule.

AlphaFold2 - multi-protein complexes ("multimer")

Tool name: AlphaFold2
Input: FASTA file with multiple chains.
Database: full_dbs is the full database to use during the sequence alignment step.
Model: multimer runs on multimeric protein complexes.

> chain a
XRMKQLEDKVEELLSKNYHLENEVARLKKLVGER
> chain b
XRMKQLEDKVEELLSKNYHLENEVARLKKLVGER

Example test job multimer FASTA input: (download example as FASTA file here)

Watch how to create a plain text file, upload, & run on COSMIC2 here.

Output

First, check the ranking of predicted models in ranking_debug.json to see which model has the highest score (as ranked by pLDDT). You can assess the quality of prediction by looking at the pLDDT file. Shown here is the top-scoring model (left) and the associated pLDDT plot (right). The atomic model is colored from N- to C-termini (Blue to Red) and shown as a homodimer. For the pLDDT plot (right), the first subunit corresponds to amino acids residue number from 1 – 34 and the second subunit is 35 – 68. You can see that the N- and C-termini have the lowest score.

Full description of AlphaFold2 parameters

Database: We provide users with the option to choose which database to use for prediction. The default is the full database (“full_dbs”). Reduced databases (“reduced_dbs”) are provided for speed, but may result in a loss of accuracy.

Model: Pre-trained neural network model to use during prediction. You have the following choices:

monomer – single chain prediction, no 3D confidence score (PAE)
monomer_casp14 - the model used during the CASP14 competition
monomer_ptm – single chain prediction and outputs 3D confidence PAE score. From DeepMind: “Slightly less accurate than monomer”
multimer – multi-chain prediction

Number of predictions per model: Indicate how many models you would like generated during prediction.

Latest date (YYYY-mm-dd) to use for template search (if using templates): If using monomer_ptm (i.e., using templates), then indicate the last date to use when searching the PDB for starting 3D models.

Models to relax: When indicated, AlphaFold2 will use AMBER relaxation to refine models using a very short molecular dynamics simulation. This is only needed for accurate side-chain positions (e.g., phasing X-ray diffraction datasets). Most users do not need this performed.