Structure validation in YASARA
When working with experimental protein structures or predicted models,
the first question is usually about its quality. Is the structure roughly correct? If yes,
are there maybe some doubtful regions that must be treated with care, especially when making predictions to guide further experimental research?
The usual approach is therefore to validate the structure. The most convincing results are achieved if the validation is based on a comparison with a gold standard of trusted reference structures. Not surprisingly,
the reference structures may share little or no sequence similarity with the structure to validate,
so how to compare them? The solution is to base the validation on general aspects of protein structure,
encoded in knowledge based potentials.
A final problem still needs to be solved: The knowledge-based energies depend on the size and shape of the protein,
and also on its amino acid composition. So one cannot really associate certain energies with
'good' or 'bad'. The obvious fix is to normalize the energies, remove the dependencies mentioned above,
and obtain estimates for the expected average energy and its standard deviation from the gold standard reference structures. When validating a certain structure,
one can then easily calculate how many standard deviations it is away from the average,
thereby obtaining a 'Z-score'. E.g. a structure with a Z-score of -4 is four standard deviations below average and can be considered bad. Z-scores form the basis of most structure validation tools,
from one of the first around to today's most extensive validation tool: WHAT_CHECK,
which is also part of YASARA in the Twinset.
In YASARA Structure,
validation takes a twist: it is entirely based on Z-scores calculated from molecular dynamics force field energies. As it turns out,
this approach has only advantages: