This thread is locked.Only browsing is available.
Top Page > Browsing
The misterious performance deterioration
Date: 2017/03/21 01:28
Name: Kylin

Dear all
Currently I encounter a strange problems for openmx 3.8.3 with 10x slower of calculation.
Initially I submit a nomd test for 28 atom NiO slab into the cluster with Intel MPI+MKL+ICC for openmx 3.8.3. Although the scf loop is not convergent for 100 steps, the cluster still finished the result within 180s. Unfortunately for my further test, I found the performance of openmx 3.8.3 with the same setup significantly dropped, for 2000s to finish 10 scf loops. Thus I didn't known what's wrong to my system? Did the openmx take the wrong calculation in the first time or there is something problem with my remote cluster (some times I know there would be some problem with that cluster)?


First Time Calculation with 100 SCF loops
***********************************************************
***********************************************************
Computational Time (second)
***********************************************************
***********************************************************

Elapsed.Time. 173.827

Min_ID Min_Time Max_ID Max_Time
Total Computational Time = 18 173.491 0 173.827
readfile = 20 1.547 0 1.550
truncation = 0 0.000 0 0.000
MD_pac = 0 0.292 20 0.297
OutData = 18 1.366 0 1.700
DFT = 16 169.467 0 169.535

*** In DFT ***

Set_OLP_Kin = 6 1.136 16 5.407
Set_Nonlocal = 16 5.561 6 9.832
Set_ProExpn_VNA = 18 9.432 4 10.500
Set_Hamiltonian = 4 26.845 11 26.847
Poisson = 15 0.315 0 0.317
Diagonalization = 0 99.482 12 99.499
Mixing_DM = 22 1.824 0 1.826
Force = 5 5.561 21 5.561
Total_Energy = 12 1.545 0 1.551
Set_Aden_Grid = 4 0.038 20 1.110
Set_Orbitals_Grid = 3 0.105 10 0.285
Set_Density_Grid = 6 10.782 20 10.889
RestartFileDFT = 22 0.076 12 0.262
Mulliken_Charge = 17 0.126 15 0.129
FFT(2D)_Density = 0 0.304 11 0.306
Others = 10 0.721 7 0.944

Second Time Calculation with 10 SCF loops
***********************************************************
***********************************************************
Computational Time (second)
***********************************************************
***********************************************************

Elapsed.Time. 1044.377

Min_ID Min_Time Max_ID Max_Time
Total Computational Time = 4 1043.990 0 1044.377
readfile = 21 14.650 3 14.738
truncation = 0 0.000 0 0.000
MD_pac = 12 0.000 20 0.060
OutData = 21 11.621 0 11.967
DFT = 16 1000.361 1 1002.835

*** In DFT ***

Set_OLP_Kin = 6 3.827 16 115.899
Set_Nonlocal = 16 114.861 6 227.012
Set_ProExpn_VNA = 11 186.810 16 223.600
Set_Hamiltonian = 20 49.311 15 49.580
Poisson = 6 0.581 0 0.699
Diagonalization = 10 319.211 0 319.457
Mixing_DM = 16 2.460 22 2.620
Force = 16 106.718 11 106.741
Total_Energy = 10 33.580 0 33.611
Set_Aden_Grid = 16 0.510 11 37.220
Set_Orbitals_Grid = 5 0.105 22 5.429
Set_Density_Grid = 14 21.912 16 23.561
RestartFileDFT = 7 0.210 9 0.361
Mulliken_Charge = 0 0.480 14 2.031
FFT(2D)_Density = 12 0.800 19 0.980
Others = 22 2.707 5 10.779

The above is output from nio_28.out file. I found the time cost of I/O in 2nd time is 10 large than that in the first time. Thus I guess maybe it may attributed to bottleneck of disk data exchange. Maybe openmx like other DFT calculation needs a lot of memory and disk usage. Currently there is something wrong with my remote cluster for I/O?

Does anyone has any insight about this problem? whether or not the DISK swap is crucial for openmx DFT calculation?

Cheers
Kylin



メンテ
Page: [1]

Re: The misterious performance deterioration ( No.1 )
Date: 2017/03/21 01:53
Name: Kylin

I even test my system in my MPB with MPICH+gcc-v6+openBLAS

Despite the long time of Diagonalization, the performance in MPB with 1 cores is better than that on cluster with 24 cores in one node. Thus I think maybe the cluster encounter some problem, but could someone give us a instruction on how to identify it?

MPB Calculation with 5 SCF loops
***********************************************************

Elapsed.Time. 913.708

Min_ID Min_Time Max_ID Max_Time
Total Computational Time = 0 913.708 0 913.708
readfile = 0 8.961 0 8.961
truncation = 0 0.000 0 0.000
MD_pac = 0 0.002 0 0.002
OutData = 0 0.641 0 0.641
DFT = 0 899.731 0 899.731

*** In DFT ***

Set_OLP_Kin = 0 75.318 0 75.318
Set_Nonlocal = 0 70.700 0 70.700
Set_ProExpn_VNA = 0 124.070 0 124.070
Set_Hamiltonian = 0 26.882 0 26.882
Poisson = 0 0.281 0 0.281
Diagonalization = 0 437.818 0 437.818
Mixing_DM = 0 1.037 0 1.037
Force = 0 123.659 0 123.659
Total_Energy = 0 19.421 0 19.421
Set_Aden_Grid = 0 0.296 0 0.296
Set_Orbitals_Grid = 0 0.518 0 0.518
Set_Density_Grid = 0 19.241 0 19.241
RestartFileDFT = 0 0.027 0 0.027
Mulliken_Charge = 0 0.040 0 0.040
FFT(2D)_Density = 0 0.314 0 0.314
Others = 0 0.110 0 0.110
メンテ
Re: The misterious performance deterioration ( No.2 )
Date: 2017/03/21 09:27
Name: T. Ozaki

Hi,

The most likely cause is the disk IO of your system.
This can be checked by controlling amount of output files
with the following keyword:

level.of.fileout 0

It would be also helpful to check whether the binary mode
improves the degradation. See below:

http://www.openmx-square.org/openmx_man3.8/node172.html

In your case the second trial became quite slow. Thus, overwriting
files with the same file name might cause some problem.
To check this, delete files stored in a directory '*_rst' and other
output files generated by the first trial, and start it from fully
scratch.

Regards,

TO
メンテ

Page: [1]