Convergence and time-consuming issues of large-scale calculations

Top Page > Browsing

Convergence and time-consuming issues of large-scale calculations

Date: 2018/03/23 12:49
Name: xmzhang <xmzhang@theory.issp.ac.cn>: Dear OpenMX developers and user：

I would like to consult with you some problems of convergence and time-consuming issues in large-scale calculations.
First, When atoms of system is 300, Kpoints 3x3x1, 48 CPUs, the calculating time of 1SCF is 45s.
But when the number of atoms is 700, Kpoints 1x2x1, 48 CPUs, the process is killed.This is the error:
yhrun: error: cn603: task 14: Segmentation fault (core dumped)
yhrun: First task exited 60s ago
yhrun: tasks 11-12,15: running
yhrun: tasks 0-10,13-14,16-23: exited abnormally
yhrun: Terminating job step 627108.0
slurmd[cn602]: *** STEP 627108.0 KILLED AT 2018-03-22T09:39:06 WITH SIGNAL 9 ***
yhrun: Job step aborted: Waiting up to 2 seconds for job step to finish.
slurmd[cn602]: *** STEP 627108.0 KILLED AT 2018-03-22T09:39:06 WITH SIGNAL 9 ***
yhrun: error: cn602: task 11: Killed
When CPUs is 144,the calculation time of 1SCFis 13440s(700 atoms, kpoints 1x2x1). At the same time, the convergence is poor.
I would like to consult, when the number of atoms increases, the computational efficiency will produce such a big difference?
Second, I would like to consult that how to set parameters to improve calculation time and release virtual memory?
Last one is how to set parameters that could make the convergence of large-scale systems as soon as possible?

Thank you!

Page: [1]

Re: Convergence and time-consuming issues of large-scale calculations ( No.1 )

Date: 2018/03/23 21:44
Name: T. Ozaki

Hi,

I guess that the segmentation fault was caused by memory shortage.
To reduce memory requirement, the ScaLAPACK version is effective.
Please take a look at
http://www.openmx-square.org/openmx_man3.8/node9.html
http://www.openmx-square.org/openmx_man3.8/node88.html

Regards,

TO

Re: Convergence and time-consuming issues of large-scale calculations ( No.2 )

Date: 2018/03/26 11:37
Name: xmzhang <xmzhang@theory.issp.ac.cn>

Dear T. Ozaki:
According to your suggestion, I have already installed the ScaLAPACK version of openmx.This is my makefile:
CC = mpicc -O3 -Dscalapack -ffast-math -fopenmp -I/opt/intel/composer_xe_2015.1.133/mkl/include/fftw -I/opt/intel/composer_xe_2015.1.133/mkl/include/
FC = mpif90 -O3 -ffast-math -fopenmp -I/opt/intel/composer_xe_2015.1.133/mkl/include/
LIB= -L/home/ISSP2/xmzhang/software/fftw-3.3.4/lib -lfftw3 -L/opt/intel/composer_xe_2015.1.133/compiler/lib/intel64 -liomp5 -L/opt/intel/composer_xe_2015.1.133/mkl/lib/intel64 -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lpthread -lgfortran
Then I have a test with a system of 700 atoms,I found it could run 63 SCFs with 28 CPUs and then it was killed. This is the error:APPLICATION TERMINATED WITH THE EXIT STRING: Killed (signal 9).
As for non-ScaLAPACK version, the process was killed immediately and 1 SCF was not running(700 atoms, 28 CPUs). The error is the same as ScaLAPACK version's.
I think if there is a shortage of memory, it should be that 1 SCF is not running and is killed immediately instead of that is killed after running 63 SCFs. I want to consult what caused this error？ How can I solve it?

Thank you!

Page: [1]