Re: Another possible erratic problem in the Forcetest for F2_GGA.dat and GaAs_LDA.dat ( No.1 ) 
 Date: 2017/01/03 22:38
 Name: Artem Pulkin
 Just for my curiosity, do you use OpenMPI, the Intel one or something else?

Re: Another possible erratic problem in the Forcetest for F2_GGA.dat and GaAs_LDA.dat ( No.2 ) 
 Date: 2017/01/04 22:21
 Name: Kylin
 To Artem Pulkin
For my MPB, Mpich2 was employed for test with the GNU compiler, however for the Cluster and the workstation the different versions of Intel MPI with Intel compiler was employed. However the problem would be the same, the unconvergent results between runs.

Re: Another possible erratic problem in the Forcetest for F2_GGA.dat and GaAs_LDA.dat ( No.3 ) 
 Date: 2017/01/04 23:18
 Name: Artem Pulkin
 Can you try compiling and running with OpenMPI as well?

Re: Another possible erratic problem in the Forcetest for F2_GGA.dat and GaAs_LDA.dat ( No.4 ) 
 Date: 2017/01/09 18:34
 Name: Kylin
 In my MPB with OpenMPI+gcc6+openBlas, the Forcetest also failed.
Cheers Kylin

Re: Another possible erratic problem in the Forcetest for F2_GGA.dat and GaAs_LDA.dat ( No.5 ) 
 Date: 2017/03/02 09:23
 Name: T. Ozaki
 Hi,
I think that GaAs_LDA.dat has no problem. 1e8~9 would be considered to be enough. In parallel calculations it is not guaranteed that the sequence of summation is always the same in any case even using the same number of cores. This means that roundingoff error varies depending on the trial. In addition, in OpenMX some of arrays are allocated as single float to reduce memory consumption, resulting in more serious dependency of roundingoff error on trials. Did you check the case of F2_GGA carefully? I guess that the variation comes from difference in the SCF convergence. It has been already known that the history of the SCF convergence depends on the number of MPI processes and also runs. If you got the sufficient convergence for F2_GGA and still have large difference between analytic and numerical forces, this might be attributed to the way of compilation, or more serious program bugs.
Regards,
TO

Re: Another possible erratic problem in the Forcetest for F2_GGA.dat and GaAs_LDA.dat ( No.6 ) 
 Date: 2017/03/09 19:03
 Name: Kylin
 Thanks for your help TO.
For F2_GGA I think the problem would be attributed to the older version of Intel Compiler in the workstation. It is really out of date and I cannot update it which belongs to someone else.
BTW in my point of view, the repeatability would be really important. Ideally in the same machine with the same code, you should always obtained the same result. As least for my experience with lammps code, if we set the same random seed, the output would be always constant. But it seems that the roundoff error cannot be fixed that varies from case to case? Thus we cannot obtain the constant result?
Cheers Kylin

Re: Another possible erratic problem in the Forcetest for F2_GGA.dat and GaAs_LDA.dat ( No.7 ) 
 Date: 2017/03/10 00:15
 Name: T. Ozaki
 Hi,
> BTW in my point of view, the repeatability would be really important. > Ideally in the same machine with the same code, you should always obtained the same > result. As least for my experience with lammps code, if we set the same random seed, > the output would be always constant. But it seems that the roundoff error cannot be > fixed that varies from case to case? Thus we cannot obtain the constant result?
Yes, I agree with you that the repeatability would be really important. However, it is known that the result would vary run to tun due to the absence of associativity of floatingpoint depending on the implementation of MPI package, while the result should be equivalent runtorun if we can control the sequence of summation in MPI, as discussed in https://software.intel.com/enus/articles/consistencyoffloatingpointresultsusingtheintelcompiler
This is also discussed even for another MD code: Gromacs as shown in http://www.gromacs.org/Documentation/Terminology/Reproducibility
I did twice 'runtest' on the same machine with 8 MPI processes and 2 OMP threads, and got the following runtest.result files:
* the first run:
1 input_example/Benzene.dat Elapsed time(s)= 5.24 diff Utot= 0.000000000003 diff Force= 0.000000000002 2 input_example/C60.dat Elapsed time(s)= 15.11 diff Utot= 0.000000000003 diff Force= 0.000000000004 3 input_example/CO.dat Elapsed time(s)= 8.13 diff Utot= 0.000000000000 diff Force= 0.000000000003 4 input_example/Cr2.dat Elapsed time(s)= 8.80 diff Utot= 0.000000002143 diff Force= 0.000000000029 5 input_example/CrysMnO.dat Elapsed time(s)= 19.95 diff Utot= 0.000000000006 diff Force= 0.000000000000 6 input_example/GaAs.dat Elapsed time(s)= 24.58 diff Utot= 0.000000000010 diff Force= 0.000000000000 7 input_example/Glycine.dat Elapsed time(s)= 5.04 diff Utot= 0.000000000054 diff Force= 0.000000000003 8 input_example/Graphite4.dat Elapsed time(s)= 4.41 diff Utot= 0.000000000000 diff Force= 0.000000000001 9 input_example/H2OEF.dat Elapsed time(s)= 4.00 diff Utot= 0.000000000000 diff Force= 0.000000000001 10 input_example/H2O.dat Elapsed time(s)= 3.81 diff Utot= 0.000000000000 diff Force= 0.000000000001 11 input_example/HMn.dat Elapsed time(s)= 13.57 diff Utot= 0.000000000000 diff Force= 0.000000000000 12 input_example/Methane.dat Elapsed time(s)= 3.37 diff Utot= 0.000000000000 diff Force= 0.000000000000 13 input_example/Mol_MnO.dat Elapsed time(s)= 8.85 diff Utot= 0.000000000539 diff Force= 0.000000000201 14 input_example/Ndia2.dat Elapsed time(s)= 5.29 diff Utot= 0.000000000000 diff Force= 0.000000000001
* the second run:
1 input_example/Benzene.dat Elapsed time(s)= 5.05 diff Utot= 0.000000000003 diff Force= 0.000000000000 2 input_example/C60.dat Elapsed time(s)= 15.04 diff Utot= 0.000000000003 diff Force= 0.000000000003 3 input_example/CO.dat Elapsed time(s)= 8.09 diff Utot= 0.000000000000 diff Force= 0.000000000003 4 input_example/Cr2.dat Elapsed time(s)= 9.16 diff Utot= 0.000000002143 diff Force= 0.000000000029 5 input_example/CrysMnO.dat Elapsed time(s)= 19.96 diff Utot= 0.000000000005 diff Force= 0.000000000001 6 input_example/GaAs.dat Elapsed time(s)= 24.56 diff Utot= 0.000000000010 diff Force= 0.000000000000 7 input_example/Glycine.dat Elapsed time(s)= 4.92 diff Utot= 0.000000000054 diff Force= 0.000000000003 8 input_example/Graphite4.dat Elapsed time(s)= 4.57 diff Utot= 0.000000000000 diff Force= 0.000000000001 9 input_example/H2OEF.dat Elapsed time(s)= 3.98 diff Utot= 0.000000000000 diff Force= 0.000000000001 10 input_example/H2O.dat Elapsed time(s)= 3.78 diff Utot= 0.000000000000 diff Force= 0.000000000001 11 input_example/HMn.dat Elapsed time(s)= 14.67 diff Utot= 0.000000000000 diff Force= 0.000000000000 12 input_example/Methane.dat Elapsed time(s)= 3.35 diff Utot= 0.000000000000 diff Force= 0.000000000000 13 input_example/Mol_MnO.dat Elapsed time(s)= 9.62 diff Utot= 0.000000000539 diff Force= 0.000000000201 14 input_example/Ndia2.dat Elapsed time(s)= 5.27 diff Utot= 0.000000000000 diff Force= 0.000000000001
A close look informs us that the last digit is different in some cases. Actually this order of difference happens for such a sized system. If we treat largescale systems and perform more molecular dynamics steps, the difference for runtorun becomes magnified.
Of course, the statement mentioned above does not exclude a possibility that OpenMX has program bugs, while we have been trying to enhance the reliability of OpenMX.
Regards,
TO

Re: Another possible erratic problem in the Forcetest for F2_GGA.dat and GaAs_LDA.dat ( No.8 ) 
 Date: 2017/03/10 16:11
 Name: T. Ozaki
 Hi,
I also performed the forcetest for F2_GGA.dat and GaAs_LDA.dat with 8 MPI processes repeatedly in my computational environment, and obtained exactly the same result within double precision.
Based on my trial, I would suspect the math library you used (I guess that you used MKL). Could you try ACML instead of MKL, and report what happens?
Regards,
TO
