|
Homology modeling in
YASARA
YASARA Structure features a complete homology
modeling module that fully automatically takes all the steps from an
amino acid sequence to
a refined high-resolution model using a CASP approved protocol[1].
Additionally, YASARA writes a detailed
scientific
report about the individual
modeling steps. If available, user-supplied hints (template
structures,
alignments) can be included. The individual modeling steps can be
summarized as follows:
- The target sequence is PSI-BLASTed[2]
against Uniprot to build a position-specific scoring matrix (PSSM) from
related sequences, then this profile is used to search the PDB for
potential modeling templates. Common protein purification tags are excluded to avoid false positives. If the
homology is too remote to be detected by PSI-BLAST, the target is
considered difficult and templates have to be provided manually (for
example using one of the many fold recognition servers on the web).
- The templates are ranked based on the alignment
score and
the structural quality according to WHAT_CHECK[3]
obtained from the PDBFinder2 database[4].
Usually models built using high-resolution X-ray templates are
more accurate than those created from lower resolution X-ray or NMR
templates, even if the latter share a higher percentage sequence
identity. Models are built for the top scoring templates.
- If structure factors have been deposited at the
PDB,
re-refined template structures are also included, provided that they
are already part of the PDB-redo database.
- Gene fusion events are detected, where the
target sequence spans more than one template molecule. These are
automatically fused in the correct order.
- For each available template, the alignment with the target sequence is obtained
using large amounts of additional information: sequence-based profiles of target
and template are calculated from related Uniprot sequences, optionally augmented
with structure-based profiles from related template structures.
The alignment also considers structural information contained
in the template (avoiding gaps in secondary structure elements, keeping polar residues
exposed etc.), as well as the predicted target secondary structure[5].
This structure-based alignment correction is partly based on SSALN scoring matrices[6].
Alternatively, manual alignments can of course also be provided.
- If the alignment is not certain, alternative
high-scoring alignments are created using a stochastic approach[7],
and models are built for all of them.
- If templates exist in oligomeric states
(according to
the PQS database),
models may be built in the same state, so that interactions
between side-chains across the interface can be considered. This
includes all kinds of hetero-oligomers, e.g. a
homo-dimer of two hetero-dimers.
- In case of insertions and deletions, an indexed
version
of the PDB is used to determine the optimal loop anchor points and
collect
possible loop conformations.
- If templates contain ligands, these
molecules are
parameterized and fully considered in the homology modeling
procedure, including hydrogen
bonding and other interactions with the peptide chain.
- A graph of the
side-chain rotamer
network is built,
dead-end elimination is used to find an initial rotamer solution in
the
context of a simple repulsive energy function[8].
- The loops are optimized by trying hundreds of
different
conformations and re-optimizing the side-chains for all of them.
- Side-chain rotamers
are fine-tuned
considering
electrostatic and knowledge-based packing
interactions as well as
solvation effects.
- The model's hydrogen
bonding
network is optimized,
including pH-dependence and ligands.
- An unrestrained high-resolution refinement with
explicit
solvent molecules is run, using the latest
knowledge-based force
fields. The result is validated to
ensure
that the refinement did not move the
model in
the wrong direction.
- The tasks above are performed for all
combinations of
templates and alignments, per-residue quality indicators for the
resulting models
are determined.
- A hybrid model is built, bad regions in the top
scoring
model are iteratively replaced with corresponding fragments from the
other
models.
- A scientific
report with details about all the
steps
above is written automatically, which can serve as the basis for a
subsequent publication.
Ray-traced figures and per-residue quality plots are included, as
well as an overall judgment of the model quality, ranging from
'Optimal' to
'Terrible'.
- CASP evaluation results are
available here.
R E F E R E N C E S
[1] Improving
physical realism, stereochemistry, and side-chain accuracy in homology
modeling: Four approaches that performed well in CASP8
Krieger E, Joo K, Lee J, Lee J, Raman S, Thompson J, Tyka M, Baker D,
Karplus K (2009), Proteins 77 Suppl 9,114-122
[2] Gapped
BLAST and PSI-BLAST: a new
generation of protein database search programs
Altschul SF, Madden TL, Schaeffer AA, Zhang J, Zhang Z, Miller W and
Lipman DJ (1997) Nucleic Acids
Res. 25,3389-3402
[3] Errors in protein structures
Hooft RWW, Vriend G, Sander C, Abola EE (1996) Nature 381,272
[4] The PDBFINDER database: A summary
of PDB, DSSP and HSSP
information with added value
Hooft RWW, Sander C and Vriend G (1996) CABIOS/Bioinformatics 12, 525-529
[5] Identification
and application of the
concepts important for accurate and reliable protein secondary
structure prediction
King RD and Sternberg MJE (1996), Protein
Sci. 5,2298-2310
[6] SSALN: An alignment algorithm
using structure-dependent substitution matrices and gap penalties
learned from structurally aligned protein pairs
Qiu J and Elber R (2006) Proteins
62,881-891
[7] Stochastic pairwise alignments
Mueckstein U, Hofacker IL and Stadler PF (2002) Bioinformatics 18, Suppl.2 153-160
[8] A graph-theory algorithm for
rapid protein side-chain prediction
Canutescu AA, Shelenkov AA and Dunbrack RL Jr. (2003), Protein Sci. 12,2001-2014.
|