YASARA menu YASARA menu

Multi CPU support in YASARA

Back in 2004, the exponential increase in desktop computing power came to a sudden end. After more than two decades of strict adherence to Moore's law, the laws of physics had their revenge and prevented CPU manufacturers from raising the clock frequency beyond 3-4 GHz. Also 20 years later, the common clock frequencies had increased to only 4-5 GHz. The solution to the problem has been borrowed from the supercomputing world: combining multiple CPUs, initially on the same mainboard, and in the mean time also on the same silicon die. Starting with dual core CPUs in 2005, we arrived at quad core CPUs in 2006, 16 cores in 2011 and 32 cores in 2018. With the ability to execute two threads per core, this means that 64 CPU threads need to be kept busy.

Parallel MD
Figure 1: Distributing segments of the simulation cell over CPU cores to parallelize the MD simulation.
Parallel MD
Figure 2: Number of simulation steps (2 fs) completed per second as a function of available CPUs. AMBER99 force field, periodic boundaries, 8Å cutoff, 360000 protein, membrane and water atoms.
Parallel MD
Figure 1: Distributing segments of the simulation cell over CPU cores to parallelize the MD simulation.

Even though compilers provide growing support for multiple CPU cores, in practice the application itself must divide and conquer, i.e. break the task into multiple subtasks ("threads") and distribute them among the available CPU cores ("multi-threading").

YASARA provides advanced multi-threading functions, that avoid slow-downs caused by the operating system's too general and thus sub-optimal process scheduler and memory allocation interface. As an example, the figure on the right shows the helical SNARE protein complex, kindly provided by Dr. Marc Baaden at the Institut de Biologie Physico-Chimique, CNRS Paris, who is studying SNARE's involvement in membrane fusion. During a parallel simulation, this large system of 360000 atoms is dynamically split into segments, which are assigned to the available CPU cores. The plot below shows the speedups obtained for this simulation when using one to four processors on a workstation with two dual-core Opteron 265 CPUs:

Parallel MD
Figure 2: Number of simulation steps (2 fs) completed per second as a function of available CPUs. AMBER99 force field, periodic boundaries, 8Å cutoff, 360000 protein, membrane and water atoms.

An inherent disadvantage of multi-threading is the loss of reproducibility: if multiple CPUs work in parallel, certain instructions (e.g. additions) will inevitably be executed in a different order. While this does not make a difference from the mathematician's point of view, it makes a tiny small difference from the CPU's point of view, where additions are not completely associative: A+(B+C) may yield a slightly different result than (A+B)+C. The practical consequence is that molecular dynamics simulations become non-reproducible, i.e. running the same simulation a second time will result in a different trajectory. YASARA contains special functionality to avoid this problem and ensure that two simulations run with the same parameters on the same number of CPUs always yield identical trajectories, a feature which is crucial for some important applications of molecular dynamics simulations.