Top Page > Browsing
Crys-MnO example from work
Date: 2020/07/08 20:02
Name: Sergey   <slisenk@gmail.com>

Hello,

I have a question regarding Crys-MnO.dat example from "work" directory, that is the part of "runtest" calculation.

When running OpenMX-3.9.2 using "-runtest" option, I noticed small difference between my output and reference. I believe instead of 2 non zero last digits as in other "runtest.result" outputs I have 5 or 6. It is still small, I understand.

What surprises me that in the reference file I see that convergence is achieved in 45 SCF steps, while in my case 80 wasn't enough. My computational environment is Cray XC40. Also, I could not get close numbers in terms of speed.

For example, "runtest.result_xc40" shows:

OpenMX Ver.3.9
icc version 17.0.7, compiler option -Dxt3 -O3 -axCOMMON-AVX512,CORE-AVX512,CORE-AVX2,CORE-AVX-I,AVX,SSE4.2,SSE4.1,SSE3,SSSE3,SSE2 -qopenmp

Cray-XC40 (Intel Xeon E5-2695v4 2.1GHz)
18 processes (MPI) x 2 thread (OpenMP)

1 input_example/Benzene.dat Elapsed time(s)= 4.23 diff Utot= 0.000000000040 diff Force= 0.000000000002
2 input_example/C60.dat Elapsed time(s)= 12.40 diff Utot= 0.000000000001 diff Force= 0.000000000001
3 input_example/CO.dat Elapsed time(s)= 9.09 diff Utot= 0.000000000150 diff Force= 0.000000009551
4 input_example/Cr2.dat Elapsed time(s)= 8.56 diff Utot= 0.000000000462 diff Force= 0.000000000004
5 input_example/Crys-MnO.dat Elapsed time(s)= 20.81 diff Utot= 0.000000000001 diff Force= 0.000000000014
6 input_example/GaAs.dat Elapsed time(s)= 31.99 diff Utot= 0.000000000001 diff Force= 0.000000000001
7 input_example/Glycine.dat Elapsed time(s)= 4.71 diff Utot= 0.000000000001 diff Force= 0.000000000002
8 input_example/Graphite4.dat Elapsed time(s)= 4.89 diff Utot= 0.000000000032 diff Force= 0.000000000004
9 input_example/H2O-EF.dat Elapsed time(s)= 4.03 diff Utot= 0.000000000001 diff Force= 0.000000000002
10 input_example/H2O.dat Elapsed time(s)= 3.83 diff Utot= 0.000000000001 diff Force= 0.000000001042
11 input_example/HMn.dat Elapsed time(s)= 12.73 diff Utot= 0.000000000064 diff Force= 0.000000000029
12 input_example/Methane.dat Elapsed time(s)= 3.24 diff Utot= 0.000000000004 diff Force= 0.000000000001
13 input_example/Mol_MnO.dat Elapsed time(s)= 8.32 diff Utot= 0.000000000576 diff Force= 0.000000000032
14 input_example/Ndia2.dat Elapsed time(s)= 6.12 diff Utot= 0.000000000000 diff Force= 0.000000000001


Total elapsed time (s) 134.96


In my case (I have 32 cores/node): 16 MPI, 2 OPENMP threads

1 input_example/Benzene.dat Elapsed time(s)= 14.09 diff Utot= 0.000000000045 diff Force= 0.000000000007
2 input_example/C60.dat Elapsed time(s)= 30.48 diff Utot= 0.000000000016 diff Force= 0.000000000006
3 input_example/CO.dat Elapsed time(s)= 59.03 diff Utot= 0.000000000132 diff Force= 0.000000000827
4 input_example/Cr2.dat Elapsed time(s)= 38.19 diff Utot= 0.000000000410 diff Force= 0.000000000051
5 input_example/Crys-MnO.dat Elapsed time(s)= 131.87 diff Utot= 0.000000017210 diff Force= 0.000000088999
6 input_example/GaAs.dat Elapsed time(s)= 116.48 diff Utot= 0.000000002764 diff Force= 0.000000000016
7 input_example/Glycine.dat Elapsed time(s)= 12.06 diff Utot= 0.000000000001 diff Force= 0.000000000001
8 input_example/Graphite4.dat Elapsed time(s)= 19.55 diff Utot= 0.000000000018 diff Force= 0.000000000061
9 input_example/H2O-EF.dat Elapsed time(s)= 12.08 diff Utot= 0.000000000001 diff Force= 0.000000000002
10 input_example/H2O.dat Elapsed time(s)= 10.45 diff Utot= 0.000000000000 diff Force= 0.000000000020
11 input_example/HMn.dat Elapsed time(s)= 25.32 diff Utot= 0.000000000190 diff Force= 0.000000000000
12 input_example/Methane.dat Elapsed time(s)= 9.42 diff Utot= 0.000000000001 diff Force= 0.000000000001
13 input_example/Mol_MnO.dat Elapsed time(s)= 20.89 diff Utot= 0.000000000389 diff Force= 0.000000000237
14 input_example/Ndia2.dat Elapsed time(s)= 14.11 diff Utot= 0.000000000088 diff Force= 0.000000000068


Total elapsed time (s) 514.02

We have different intel compilers (16,17,18), MKL or LibSci, but I still execution time is quite different.

Any ideas what it can be?

Thanks,
Sergey



メンテ
Page: [1]

Re: Crys-MnO example from work ( No.1 )
Date: 2020/07/09 23:00
Name: T. Ozaki

Hi,

Thank you for the detailed report.

I had also noticed that the SCF convergence for Crys-MnO.dat seems to be sensitive to computational environment.

With the following keywors:

scf.Mixing.Type rmm-diisv # Simple|Rmm-Diis|Gr-Pulay|Kerker|Rmm-Diisk
scf.Init.Mixing.Weight 0.010 # default=0.30
scf.Min.Mixing.Weight 0.001 # default=0.001
scf.Max.Mixing.Weight 0.150 # default=0.40
scf.Mixing.History 35 # default=5
scf.Mixing.StartPulay 18 # default=6
scf.criterion 1.0e-9 # default=1.0e-6 (Hartree)

the SCF iteration required for the convergence was 42.

As for the computational time, could you take a look at the part of "Computational Time (second)" that you may find
at the end of out file, and see decomposed computational time. Which part of the calculations takes time compared to
the reference case stored in "input_example"?

I wonder that "Total elapsed time (s) 514.02" is too slow.
As you can see, the other cases of runtest.result in "input_example" show less than 200 sec. if a recent intel CPU is used.

Regards,

TO
メンテ
Re: Crys-MnO example from work ( No.2 )
Date: 2020/07/10 05:38
Name: Sergey  <slisenk@gmail.com>

Hi Dr. Ozaki,

Thanks for reply.

Here is my time statistics for Crys-MnO.dat example:

Min_ID Min_Time Max_ID Max_Time
Total Computational Time = 6 124.012 0 124.150
readfile = 7 4.285 0 4.285
truncation = 7 0.164 0 0.243
MD_pac = 0 0.057 6 0.217
OutData = 4 3.005 0 3.142
DFT = 0 116.261 7 116.338

*** In DFT ***

Set_OLP_Kin = 6 0.109 0 0.975
Set_Nonlocal = 0 1.746 6 2.612
Set_ProExpn_VNA = 5 2.018 0 12.488
Set_Hamiltonian = 10 8.585 14 8.664
Poisson = 15 0.053 2 0.072
Diagonalization = 14 37.230 4 37.280
Mixing_DM = 4 2.459 2 2.541
Force = 2 43.914 6 43.914
Total_Energy = 6 1.612 0 1.614
Set_Aden_Grid = 0 0.011 5 10.476
Set_Orbitals_Grid = 4 0.000 3 0.236
Set_Density_Grid = 6 6.118 0 6.339
RestartFileDFT = 12 0.081 0 0.090
Mulliken_Charge = 13 0.039 11 0.041
FFT(2D)_Density = 0 0.000 0 0.000
Others = 3 0.505 9 0.895



Here is from reference file (Cray XC40):

Min_ID Min_Time Max_ID Max_Time
Total Computational Time = 1 20.390 0 20.400
readfile = 7 1.996 0 1.996
truncation = 9 0.164 0 0.207
MD_pac = 0 0.000 15 0.000
OutData = 1 0.078 0 0.089
DFT = 0 18.109 9 18.152

*** In DFT ***

Set_OLP_Kin = 6 0.038 2 0.685
Set_Nonlocal = 2 1.167 6 1.814
Set_ProExpn_VNA = 10 1.105 0 2.657
Set_Hamiltonian = 17 2.635 0 2.635
Poisson = 10 0.011 17 0.011
Diagonalization = 0 7.253 12 7.254
Mixing_DM = 17 0.494 10 0.494
Force = 9 1.167 0 1.167
Total_Energy = 4 0.903 0 0.903
Set_Aden_Grid = 0 0.006 10 1.557
Set_Orbitals_Grid = 5 0.000 2 0.049
Set_Density_Grid = 6 1.060 0 1.065
RestartFileDFT = 19 0.002 0 0.003
Mulliken_Charge = 11 0.005 18 0.005
FFT(2D)_Density = 0 0.000 0 0.000
Others = 2 0.009 6 0.106

I don't know how "old" is your Cray XC 40, but ours was deployed in 2015 and has Intel Xeon E5-2698 v3 processors (Haswell), 2.3 GHz

What I also found that intel 16 compiler generates slightly faster executable than intel 18 compiler.

Sergey
メンテ
Re: Crys-MnO example from work ( No.3 )
Date: 2020/07/10 09:07
Name: T. Ozaki

Hi,

The elapsed time must be determined by the performance of a single CPU and/or memory, since the benchmark
calculations use a small number of cores.

The E5-2698 v3 (yours) may not be largely different from mine (E5-2695v4).
Also, runtest.result_hster (E5-2640) and runtest.result_sekirei (E5-2680v3) show a much better result than yours.

I wonder that there must be a some reason causing the degradation of performance.

Regards,

TO
メンテ
Re: Crys-MnO example from work ( No.4 )
Date: 2020/07/11 09:22
Name: Sergey  <slisenk@gmail.com>

Hi,

Thanks for reply. I'm not sure what was going on. I played a lot with different compiler options and timing wasn't great.

I also noticed that I had "ulimit -s unlimited" in my PBS script. When I removed it, timing is better now.
メンテ

Page: [1]

Thread Title (must) Move the thread to the top
Your Name (must)
E-Mail (must)
URL
Password (used in modification of the submitted text)
Comment (must)

   Save Cookie