This thread is locked.Only browsing is available.
Top Page > Browsing
memory increasing as MD steps increase and the version issue on it
Date: 2021/07/14 21:32
Name: Kunihiro Yananose   <ykunihiro@snu.ac.kr>

Dear Developers and Users,

Hi. Recently, I tried the relaxation of the atomic structure of a large system. However, I encountered a memory exceeding problem. The memory occupation increased at every MD step and the calculation was terminated by the system after several MD steps due to the memory exceeding.

I found the following threads about similar situations from this forum,
http://www.openmx-square.org/forum/patio.cgi?mode=view&no=1248
http://www.openmx-square.org/forum/patio.cgi?mode=view&no=2619
http://www.openmx-square.org/forum/patio.cgi?mode=view&no=2843
In principle, I can finish the relaxation by restarting the calculation several times. However, I have tried some tests and I feel that it should be reported in detail.

From the first thread, I learned that the memory occupation will increase as MD proceeds when we use the BFGS, RF, or EF method.
If I understand correctly,
1. Even in such a case, after the MD step reaches at (MD.Opt.StartDIIS)+(MD.Opt.DIIS.History), the memory usage will be saturated.
2. If I use the steepest descent method (MD.type Opt), memory usage does not significantly increase in comparison to the first MD step.

However, what I found from the tests with the smaller system are
1. When I use the RF method, even when the MD step far exceeds (MD.Opt.StartDIIS)+(MD.Opt.DIIS.History), memory occupation increases at every MD step.
2. When I use the SD method (Opt), memory occupation increases by a similar amount with the RF case.
3. I used the 3.9 version for the above 2 cases. However, when I used the 3.8 version for the same system, the memory usage was saturated at a moderate amount in comparison to the first MD step.

In detail, when I checked the memory usage of the calculations running on the 20 cores* 2 nodes (totally 40 cores) with a total of 252 GB memory, the changes between the 1st step and nearly the 50th step are as follows.
(1) ver 3.9 with RF : 13.5 % to 46.85 %
(2) ver 3.9 with SD : 13.5 % to 43.6 %
(3) ver 3.8 with RF : 14.75 % to 21.0 % (this value was almost saturated at 13th step)
(4) ver 3.8 with SD : 17.65 % to 22.8 %

So I guess that the memory leak newly occurs in version 3.9. I tried the memory leak test by adding “memory.leak on” option, but it did not work.
From the automatic memory leak test in the work directory by using the -mltest option, I couldn’t find a problem from the ‘mltest.result’ file.

I’m not sure whether it is due to a bug in code or due to my compiling. Could someone please check this issue?

I will attach my input file for the test here.

Sincerely,
Kunihiro Yananose

===========================================
System.CurrrentDirectory ./
System.Name CsPbI3_222rel
level.of.stdout 1
level.of.fileout 0

memory.leak on

Species.Number 3
<Definition.of.Atomic.Species
Cs Cs12.0-s2p2d2f1 Cs_PBE19
Pb Pb8.0-s2p2d2f1 Pb_PBE19
I I7.0-s2p2d2f1 I_PBE19
Definition.of.Atomic.Species>

Atoms.Number 40
Atoms.SpeciesAndCoordinates.Unit Ang
<Atoms.SpeciesAndCoordinates
1 Cs -0.109350302481 0.017661825817 0.123717736208 4.5 4.5
2 Cs -0.124823342223 0.147765296525 6.380680590298 4.5 4.5
3 Cs -0.007008736483 6.410857820077 -0.074383741132 4.5 4.5
4 Cs -0.118783732689 6.555175721101 6.422955307289 4.5 4.5
5 Cs 6.464431501880 -0.123579728882 -0.070189557646 4.5 4.5
6 Cs 6.427261600317 0.064232887890 6.348300039800 4.5 4.5
7 Cs 6.431834491679 6.339778169950 -0.141025479534 4.5 4.5
8 Cs 6.443892616710 6.477404425606 6.549260030321 4.5 4.5
9 Pb 3.272791985076 3.198242404407 3.199496275314 7.0 7.0
10 Pb 3.101028713465 3.276815140899 9.738897527700 7.0 7.0
11 Pb 3.224216836450 9.675880163716 3.212617710133 7.0 7.0
12 Pb 3.115689670255 9.652795003628 9.466297669410 7.0 7.0
13 Pb 9.552866474663 3.131487577824 3.099067705741 7.0 7.0
14 Pb 9.682485584765 3.300933809636 9.589271571266 7.0 7.0
15 Pb 9.635313739043 9.529292406954 3.202637961559 7.0 7.0
16 Pb 9.736884237717 9.680878345311 9.595758252134 7.0 7.0
17 I -0.040470869233 3.331136487231 3.201244902193 3.5 3.5
18 I -0.129839528095 3.333964598607 9.685554599797 3.5 3.5
19 I 0.090021458574 9.591833976545 3.343405331190 3.5 3.5
20 I 0.093392408372 9.532349466972 9.761073697706 3.5 3.5
21 I 6.545890826776 3.072905920953 3.223745556771 3.5 3.5
22 I 6.383904299641 3.136338045401 9.738173316161 3.5 3.5
23 I 6.402500603032 9.502674533975 3.061946226745 3.5 3.5
24 I 6.359066892740 9.486870196897 9.585903242210 3.5 3.5
25 I 3.159863033433 0.082467653794 3.290511543652 3.5 3.5
26 I 3.091457017761 0.133947541598 9.647750421211 3.5 3.5
27 I 3.167633380446 6.475779158721 3.271996769516 3.5 3.5
28 I 3.082218155842 6.464731433346 9.598311680722 3.5 3.5
29 I 9.730941445839 -0.045670803688 3.246576249420 3.5 3.5
30 I 9.613230046510 -0.114531203152 9.540865849085 3.5 3.5
31 I 9.685824118632 6.394104141214 3.154133422327 3.5 3.5
32 I 9.558215936012 6.296042817188 9.508648494660 3.5 3.5
33 I 3.193330822127 3.233235651656 0.146946342923 3.5 3.5
34 I 3.059034707844 3.242768045783 6.420996399233 3.5 3.5
35 I 3.138091341784 9.605332554478 -0.040018785521 3.5 3.5
36 I 3.154600024318 9.637835775037 6.551585445171 3.5 3.5
37 I 9.737581048537 3.153552960117 0.028731563163 3.5 3.5
38 I 9.518078835362 3.093365127634 6.458017392987 3.5 3.5
39 I 9.734247407858 9.547443621117 0.061449670843 3.5 3.5
40 I 9.605756045341 9.736854155951 6.531283916087 3.5 3.5
Atoms.SpeciesAndCoordinates>
Atoms.UnitVectors.Unit Ang
<Atoms.UnitVectors
12.820340000000 0.000000000000 0.000000000000
0.000000000000 12.820340000000 0.000000000000
0.000000000000 0.000000000000 12.820340000000
Atoms.UnitVectors>

scf.XcType GGA-PBE
scf.SpinPolarization off
scf.SpinOrbit.Coupling off
scf.ElectronicTemperature 300.0
scf.energycutoff 250.0
scf.maxIter 300
scf.EigenvalueSolver band
scf.Kgrid 2 2 2
scf.Mixing.Type rmm-diisk
scf.Init.Mixing.Weight 0.30
scf.Min.Mixing.Weight 0.001
scf.Max.Mixing.Weight 0.400
scf.Mixing.History 7
scf.Mixing.StartPulay 10
scf.criterion 1.0e-7

MD.Type RF

MD.Opt.DIIS.History 4
MD.Opt.StartDIIS 5
MD.maxIter 200
MD.TimeStep 0.5
MD.Opt.criterion 1.0e-4

メンテ
Page: [1]

Re: memory increasing as MD steps increase and the version issue on it ( No.1 )
Date: 2021/07/20 11:02
Name: T. Ozaki

Hi,

Thank you for your detailed report.
I will try to see what's happening, and post my report once the problem is figured out.

Regards,

TO
メンテ
Re: memory increasing as MD steps increase and the version issue on it ( No.2 )
Date: 2021/07/24 17:51
Name: T. Ozaki

Hi,

I have checked the memory usage of OpenMX in both v3.8.5 and v3.9.2 with the input file you provided
by monitoring RSS as shown at
https://t-ozaki.issp.u-tokyo.ac.jp/calcs/monitoring_memory.PNG

Both the calculations were performed using 20 MPI cores, and the sum of the memory usage required by 20 MPI cores
is shown in the figure. As we can see in the figure, the memory usage of v3.9.2 seems to be comparable to that of v3.8.5.
So, from the comparison it seems to be difficult to say that any specific memory leak exists in v3.9.2.

Howvever, I have noticed from a series of benchmarks that an unexpected memory leak seems to occur when pdgemm or pzgemm is called in some environment.
Note that only v3.9.2 calls pdgemm or pzgemm in the eigenvalue solver.
I am not sure whether what I found is related to the issue mentioned at
https://software.intel.com/content/www/us/en/develop/documentation/onemkl-linux-developer-guide/top/managing-performance-and-memory/using-memory-functions/avoiding-memory-leaks-in-onemkl.html
If so, the issue may depend on the version of MKL.

I will keep the issue in mind.
Anyway, thank you very much for your detailed resport.

Regards,

TO
メンテ
Re: memory increasing as MD steps increase and the version issue on it ( No.3 )
Date: 2021/07/27 15:11
Name: Kunihiro Yananose  <ykunihiro@snu.ac.kr>

Dear prof. Ozaki,

Thank you for the test and a kind reply.
I compiled openmx with the intel oneAPI MKL. So the MKL version might be related to this issue as you remarked.

I tried to apply the solutions in the link and did some tests. However, I couldn't see any visible improvement.

Specifically,

(1) variable setting at the job submit bash script by
export MKL_DISABLE_FAST_MM=1

(2) Recompile with the code modification of openmx.c by adding

at line 73,
#include <mkl.h>
void mkl_free_buffers(void);
void mkl_thread_free_buffers(void);


and at line 677,
mkl_free_buffers();
mkl_thread_free_buffers();

I don't know whether it is a correct way or not.

Because this is not an urgent issue to me, I'll also put off this issue for the present.

Thank you so much,
K. Yananose
メンテ
Re: memory increasing as MD steps increase and the version issue on it ( No.4 )
Date: 2021/12/10 18:28
Name: Kunihiro Yananose  <ykunihiro@snu.ac.kr>

Dear all,

Recently, I revisited this issue and made some progress. So I'd like to share what I observed. I found that this is not an issue in the openmx code, but the packages and mpi wrapper are related.
The same tests of the memory usage but with different compilations have been done. In the new tests, I used only the 3.9.9 version.

First, let's take a reference from my first post, ver 3.9.2 with RF, as the case (1).
(1) intel mpi + intel mkl : 13.5 % to 46.85 %
- In this case, both mpi and mkl are from the intel oneAPI 2021 version.

The following two cases are results of new compilations.

(2) intel mpi + netlib scalapack compiled with intel mpi wrapper : 13.1 % to 26.85 %
- The memory usage was reduced. But it was still increasing at every md steps. Computation speed is almost the same as (1).

(3) gcc compiler + openmpi + netlib scalapack : 10.4 % to 12.25 %
- Much less memory was used. The memory usage was saturated nearly at the 14th md step. But the speed is slow: openmx runtest takes almost twice as intel mpi case.

These results strongly suggest that the intel-mpi causes memory leaking.
In addition, I found that the SIESTA community recently has discussed a similar issue (https://gitlab.com/siesta-project/siesta/-/issues/29).
They suspect that the intel-mpi 2019 or later versions cause memory leaking. But I couldn't check different intel-mpi versions with openmx.

To me, using (2) for a usual case and (3) for heavy memory consuming case will be a solution. I guess it will be the same in the SCF-only calculations.
If you encounter a similar issue, I suggest trying to compile openmx with openmpi or intel-mpi 2018 version.

Sincerely,
K. Yananose

----------------------------------------------------------------------------------
Appendix

- my compile options for (2)
FFTW = -I/(my path to lib)/fftw/include
CC = mpiicc -qopenmp -DscaLAPACK -O3 -xHOST -ip -no-prec-div $(FFTW)
FC = mpiifort -qopenmp -DscaLAPACK -O3 -xHOST -ip -no-prec-div $(FFTW)
LIB= -L/(my path to lib)/fftw/lib -lfftw3 \
-L/(my path to lib)/scalapack-2.1.0 -lscalapack \
-L/(my path to lib)/lapack-3.10.0 -llapack -lrefblas \
-L/(my path to intel oneAPI)/mpi/latest/lib -lmpi -lifcore -liomp5 -lpthread -lm -ldl \

- my compile options for (3)
FFTW = -I/(my path to gcc_lib)/fftw/include
CC = mpicc -DscaLAPACK -O3 -ffast-math -fopenmp $(FFTW)
FC = mpifort -DscaLAPACK -O3 -ffast-math -fopenmp $(FFTW)
LIB= -L/(my path to gcc_lib)/fftw/lib -lfftw3 \
-L/(my path to gcc_lib)/scalapack-2.1.0 -lscalapack \
-L/(my path to gcc_lib)/lapack-3.10.0 -llapack -lrefblas -lgfortran \
-L/(my path to gcc_lib)/openmpi/lib -lmpi -lmpi_mpifh -lgomp -lpthread -lm -ldl \

メンテ
Re: memory increasing as MD steps increase and the version issue on it ( No.5 )
Date: 2022/01/11 16:19
Name: Kunihiro Yananose  <ykunihiro@snu.ac.kr>

additional information:

I tested a new case,
(4) icc compiler (from oneAPI 2021) + openmpi + netlib scalapack : 9.15 % to 13.25 %
- the final memory consuming is a little bit larger than (3), but almost ignorable difference. It was increasing even after the 40th MD step, but not at every MD steps.
Its speed is much better than (3). Just a few seconds were took more than intel-mpi&mkl case in the openmx runtest.

compile options are:
FFTW = -I/(my path to icc_lib)/fftw/include
CC = mpicc -qopenmp -DscaLAPACK -O3 -xHOST -ip -no-prec-div $(FFTW)
FC = mpifort -qopenmp -DscaLAPACK -O3 -xHOST -ip -no-prec-div $(FFTW)
LIB= -L/(my path to icc_lib)/fftw/lib -lfftw3 \
-L/(my path to icc_lib)/scalapack-2.1.0 -lscalapack \
-L/(my path to icc_lib)/lapack-3.10.0 -llapack -lrefblas \
-L/(my path to icc_lib)/openmpi/lib -lmpi -lmpi_mpifh -lmpi_usempif08 -lifcore -liomp5 -lpthread -lm -ldl
メンテ

Page: [1]