Performance enhancement with ScaLAPACK

Calculations for large-scale systems with the eigenvalue solvers 'Cluster' and 'Band', specified by the keyword 'scf.EigenvalueSolver', can be accelerated by using the ScaLAPACK version of OpenMX. Table 3 shows elapsed time for a series of benchmark calculations: runtest, and runtestL, and runtestL2, calculatd by using the non-ScaLAPACK and ScaLAPACK version of OpenMX. It is found that the ScaLAPACK version is faster than the non-ScaLAPACK version in case of runtestL2, which suggests the ScaLAPACK version is effective for large-scale systems including more than a few hundred atoms. It should be also noted that the memory usage is reduced by the ScaLAPACK version compared to the non-ScaLAPACK version. Please refer the section 'Installation' as for how to install the ScaLAPACK version.


Table 3: The elapsed time (sec.) for a series of benchmark calculations: runtest, and runtestL, and runtestL2, calculatd by using the non-ScaLAPACK and ScaLAPACK version of OpenMX. The calculations were performed using 8 MPI processes and 2 OpenMP threads for runtest, 132 MPI processes and 2 OpenMP threads for runtestL, 264 MPI processes and 2 OpenMP threads for runtestL2, respectively, on CRAY-XC30.

  non-ScaLAPACK ScaLAPACK
  (sec.) (sec.)
runtest 135 137
runtestL 1684 1599
runtestL2 30138 19946

2016-04-03