Protein folding with artificial intelligence in YASARA
The rise of artificial intelligence has changed the way protein structure prediction is done today. After AlphaFold's break-through at the CASP contest (Critical Assessment of Structure Prediction) in 2018, additional promising methods have been developed, and gratefully made publicly available. This allowed us to include them in YASARA, and make them easily accessible on your local workstation or notebook at the touch of a button as part of YASARA's comprehensive molecular modeling platform. No external cloud servers are used, so all your confidential sequences stay on your local machine. You can either run AI folding methods directly, or use them implicitly as part of a protein modeling experiment, where YASARA combines them with traditional, template-based homology modeling to obtain the best possible result (since it makes little sense to fold a protein from scratch if the structure of a close homologue is already known).

The following AI folding methods are currently accessible in YASARA:
- AlphaFold: The first and most famous AI approach from Google/Alphabet owes its high accuracy to the smart inclusion of additional experimental data, i.e. the sequences of related proteins combined in a multiple sequence alignment (MSA). Since the creation of these MSAs is computationally intensive and prediction results for about 200 million sequences have already been collected at the EBI[1], YASARA takes a shortcut here: only these 200 million sequences are stored locally to find the most similar prediction to your query sequence, which is then downloaded and used as a template for homology modeling to arrive at your model.
- ESMFold: Developed by Meta, this was the first AI large language model applied to protein folding, which does not rely on experimental data collected in MSAs, and can thus be seen as the first attempt to train an AI to really understand the protein folding problem[2]. It can outperform AlphaFold on orphan proteins (isolated sequences without close homologues) and it has a better chance to detect if a point mutation disrupts the complete structure. ESMFold is included completely in YASARA Structure.
- OmegaFold: Developed by the Chinese company Helixon, it works like ESMFold, but is faster and more memory efficient, while the prediction accuracy is claimed to be comparable to AlphaFold for shorter sequences[3]. OmegaFold is also included completely in YASARA Structure.
While AI has been a playground for high-end GPUs, all three methods work equally well if you do not have a fast GPU (e.g. when running on a notebook or a server), since a slower CPU-only fallback is included.
R E F E R E N C E S
[1] AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models
Varadi M, Anyango S, Deshpande M, Nair S, Natassia C, Yordanova G, Yuan D, Stroe O, Wood G, Laydon A, Zidek A, Green T, Tunyasuvunakool K, Petersen S, Jumper J, Clancy E, G Richard ,Vora A, Lutfi M, Figurnov M, Cowie A, Hobbs N, Kohli P, Kleywegt G, Birney E, Hassabis D, Velankar S (2022) Nucleic acids research 50, D439-D444
[2] Evolutionary-scale prediction of atomic-level protein structure with a language modelLin Z, Akin H, Rao R, Hie B, Zhu Z, Lu W, Smetanin N, Verkuil R, Kabeli O, Shmueli Y, Dos Santos Costa A, Fazel-Zarandi M, Sercu T, Candido S, Rives A (2023)Science 379,1123-1130
[3] High-resolution de novo structure prediction from primary sequence
Wu R, Ding F, Wang R, Shen R, Zhang X, Luo S, Su C, Wu Z, Xie Q, Berger B, Ma J, Peng J (2022) bioRxiv 07.21.500999