Endonuclease PvuII (1PVI) DNA - GATTACAGATTACA
CAP - Catabolite gene Activating Protein (1BER)
DNA - GATTACAGATTACAGATTACA Endonuclease PvuII bound to palindromic DNA recognition site CAGCTG (1PVI) DNA - GATTACAGATTACAGATTACA TBP - TATA box Binding Protein (1C9B)
CAP - Catabolite gene Activating Protein (1BER)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
GCN4 - leucine zipper transcription factor bound to palindromic DNA recognition site ATGAC(G)TCAT (1YSA)
TBP - TATA box Binding Protein (1C9B)
 

Homology modeling in YASARA


Homology modeling steps

YASARA Structure features a complete homology modeling module that fully automatically takes all the steps from an amino acid sequence to a refined high-resolution model using a CASP approved protocol[1]. Additionally, YASARA writes a detailed scientific report about the individual modeling steps. If available, user-supplied hints (template structures, alignments) can be included. The individual modeling steps can be summarized as follows:

  • The target sequence is PSI-BLASTed[2] against Uniprot to build a position-specific scoring matrix (PSSM) from related sequences, then this profile is used to search the PDB for potential modeling templates. Common protein purification tags are excluded to avoid false positives. If the homology is too remote to be detected by PSI-BLAST, the target is considered difficult and templates have to be provided manually (for example using one of the many fold recognition servers on the web).
  • The templates are ranked based on the alignment score and the structural quality according to WHAT_CHECK[3] obtained from the PDBFinder2 database[4]. Usually models built using high-resolution X-ray templates are more accurate than those created from lower resolution X-ray or NMR templates, even if the latter share a higher percentage sequence identity. Models are built for the top scoring templates.
  • If structure factors have been deposited at the PDB, re-refined template structures are also included, provided that they are already part of the PDB-redo database.
  • Gene fusion events are detected, where the target sequence spans more than one template molecule. These are automatically fused in the correct order.
  • For each available template, the alignment with the target sequence is obtained using large amounts of additional information: sequence-based profiles of target and template are calculated from related Uniprot sequences, optionally augmented with structure-based profiles from related template structures. The alignment also considers structural information contained in the template (avoiding gaps in secondary structure elements, keeping polar residues exposed etc.), as well as the predicted target secondary structure[5]. This structure-based alignment correction is partly based on SSALN scoring matrices[6]. Alternatively, manual alignments can of course also be provided.
  • If the alignment is not certain, alternative high-scoring alignments are created using a stochastic approach[7], and models are built for all of them.
  • If templates exist in oligomeric states (according to the PQS database), models may be built in the same state, so that interactions between side-chains across the interface can be considered. This includes all kinds of hetero-oligomers, e.g. a homo-dimer of two hetero-dimers.
  • In case of insertions and deletions, an indexed version of the PDB is used to determine the optimal loop anchor points and collect possible loop conformations.
  • If templates contain ligands, these molecules are parameterized and fully considered in the homology modeling procedure, including hydrogen bonding and other interactions with the peptide chain.
  • A graph of the side-chain rotamer network is built, dead-end elimination is used to find an initial rotamer solution in the context of a simple repulsive energy function[8].
  • The loops are optimized by trying hundreds of different conformations and re-optimizing the side-chains for all of them.
  • Side-chain rotamers are fine-tuned considering electrostatic and knowledge-based packing interactions as well as solvation effects.
  • The model's hydrogen bonding network is optimized, including pH-dependence and ligands.
  • An unrestrained high-resolution refinement with explicit solvent molecules is run, using the latest knowledge-based force fields. The result is validated to ensure that the refinement did not move the model in the wrong direction.
  • The tasks above are performed for all combinations of templates and alignments, per-residue quality indicators for the resulting models are determined.
  • A hybrid model is built, bad regions in the top scoring model are iteratively replaced with corresponding fragments from the other models.
  • A scientific report with details about all the steps above is written automatically, which can serve as the basis for a subsequent publication. Ray-traced figures and per-residue quality plots are included, as well as an overall judgment of the model quality, ranging from 'Optimal' to 'Terrible'.
  • CASP evaluation results are available here.

R E F E R E N C E S

[1] Improving physical realism, stereochemistry, and side-chain accuracy in homology modeling: Four approaches that performed well in CASP8
Krieger E, Joo K, Lee J, Lee J, Raman S, Thompson J, Tyka M, Baker D, Karplus K (2009), Proteins 77 Suppl 9,114-122
[2] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Altschul SF, Madden TL, Schaeffer AA, Zhang J, Zhang Z, Miller W and Lipman DJ (1997)  Nucleic Acids Res. 25,3389-3402
[3] Errors in protein structures
Hooft RWW, Vriend G, Sander C, Abola EE (1996) Nature 381,272
[4] The PDBFINDER database: A summary of PDB, DSSP and HSSP information with added value
Hooft RWW, Sander C and Vriend G (1996) CABIOS/Bioinformatics 12, 525-529
[5] Identification and application of the concepts important for accurate and reliable protein secondary structure prediction
King RD and Sternberg MJE (1996), Protein Sci. 5,2298-2310
[6] SSALN: An alignment algorithm using structure-dependent substitution matrices and gap penalties learned from structurally aligned protein pairs
Qiu J and Elber R (2006) Proteins 62,881-891
[7] Stochastic pairwise alignments
Mueckstein U, Hofacker IL and Stadler PF (2002) Bioinformatics 18, Suppl.2 153-160
[8] A graph-theory algorithm for rapid protein side-chain prediction
Canutescu AA, Shelenkov AA and Dunbrack RL Jr. (2003), Protein Sci. 12,2001-2014.