In the current implementation the NEGF calculation is parallelized by MPI. In addition to the MPI parallelization, if you use MKL, the matrix multiplication and the inverse calculation of matrix in the evaluation of the Green function are also parallelized by OpenMP. In this case, you can perform a hybrid parallelization by MPI/OpenMP which may lead to shorter computational time. The way for the parallelization is completely same as before.
In Fig. 45 we show the speedup ratio in the elapsed time for the evaluation of the density matrix of 8zigzag graphene nanoribbon (ZGNR) under a finite bias voltage of 0.5 eV. The energy points of 197 (101 and 96 for the equilibrium and nonequilibrium terms, respectively) are used for the evaluation of the density matrix. Only the point is employed for the kpoint sampling, and the spin polarized calculation is performed. Thus, the combination of 394 for the three indices are parallelized by MPI. It is found that the speedup ratio of the flat MPI parallelization, corresponding to 1 thread, reasonably scales up to 64 processes. Furthermore, it can be seen that the hybrid parallelization, corresponding to 2 and 4 threads, largely improves the speedup ratio. By fully using 64 quad core processors, corresponding to 64 processes and 4 threads, the speedup ratio is about 140, demonstrating the good scalability of the NEGF method. For the details see also Ref. [73]. It should be also noted that the number of processes in the MPI parallelization can exceed the number of atoms in OpenMX Ver. 3.9.
