Top Page > Browsing
cluster & band calculations
Date: 2017/06/10 03:26
Name: tata

Hi,

I would like to report some problem related to the calculations using "cluster" eigenvalue solver.

First of all system specification:

HP Proliant DL580 G7, centos 6.9, GNU compilers, MKL 2017

CC=mpicc -O3 -fopenmp -I/home/dft/opt/fftw-3.3.6/include -I/$MKLROOT/include
FC=mpif90 -O3 -fopenmp -I/$MKLROOT/include
LIB= -L/home/dft/opt/openmpi-2.1.1/lib -lmpi_mpifh -L/home/dft/opt/fftw-3.3.6/lib -lfftw3 -L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -ldl -lgfortran

mpiexec -np 32 ... -nt 2

I tried standard example: C16_BFGS.dat with some modifications:
------------------------------------------
System.CurrrentDirectory ./ # default=./
System.Name C16_BFGS
level.of.stdout 1 # default=1 (1-3)
level.of.fileout 0 # default=1 (0-2)

Species.Number 1
<Definition.of.Atomic.Species
C C5.0-s2p2d1 C_PBE13
Definition.of.Atomic.Species>

Atoms.Number 16
Atoms.SpeciesAndCoordinates.Unit Ang # Ang|AU
<Atoms.SpeciesAndCoordinates
1 C -0.4491250 -0.4491250 6.7177785 2.00000 2.00000
2 C -1.3473750 0.4491250 5.8978544 2.00000 2.00000
3 C 1.3473750 -0.4491250 4.9613564 2.00000 2.00000
4 C 0.4491250 0.4491250 4.0676584 2.00000 2.00000
5 C -0.4491250 -0.4491250 3.1621674 2.00000 2.00000
6 C -1.3473750 0.4491250 2.2581139 2.00000 2.00000
7 C 1.3473750 -0.4491250 1.3550285 2.00000 2.00000
8 C 0.4491250 0.4491250 0.4526505 2.00000 2.00000
9 C -0.4491250 -0.4491250 -0.4526938 2.00000 2.00000
10 C -1.3473750 0.4491250 -1.3550841 2.00000 2.00000
11 C 1.3473750 -0.4491250 -2.2581877 2.00000 2.00000
12 C 0.4491250 0.4491250 -3.1622633 2.00000 2.00000
13 C -0.4491250 -0.4491250 -4.0677786 2.00000 2.00000
14 C -1.3473750 0.4491250 -4.9614883 2.00000 2.00000
15 C 1.3473750 -0.4491250 -5.8980559 2.00000 2.00000
16 C 0.4491250 0.4491250 -6.7179699 2.00000 2.00000
Atoms.SpeciesAndCoordinates>
Atoms.UnitVectors.Unit Ang # Ang|AU
<Atoms.UnitVectors
1.796500 1.796500 0.000000
1.796500 -1.796500 0.000000
0.000000 0.000000 23.473750
Atoms.UnitVectors>

scf.XcType GGA-PBE # LDA|LSDA-CA|LSDA-PW|GGA-PBE
scf.SpinPolarization off # On|Off|NC
scf.ElectronicTemperature 800.0 # default=300 (K)
scf.energycutoff 100.0 # default=150 (Ry)
scf.maxIter 100 # default=40
scf.EigenvalueSolver band # DC|GDC|Cluster|Band
scf.Kgrid 5 5 1 # means n1 x n2 x n3
scf.Mixing.Type Rmm-Diisk # Simple|Rmm-Diis|Gr-Pulay|Kerker|Rmm-Diisk
scf.Init.Mixing.Weight 0.10 # default=0.30
scf.Min.Mixing.Weight 0.01 # default=0.001
scf.Max.Mixing.Weight 0.200 # default=0.40
scf.Mixing.History 14 # default=5
scf.Mixing.StartPulay 7 # default=6
scf.criterion 1.0e-6 # default=1.0e-6 (Hartree)

MD.Type OptC5 # Nomd|Opt|NVE|NVT_VS|NVT_NH
MD.Opt.DIIS.History 3 # default=3
MD.Opt.StartDIIS 7 # default=5
MD.maxIter 200 # default=1
MD.Opt.criterion 1.0e-3 # default=1.0e-4 (Hartree/bohr)

<MD.Fixed.Cell.Vectors
0 0 1
0 0 1
1 1 0
MD.Fixed.Cell.Vectors>
-------------------------------------

Using "scf.EigenvalueSolver band" always leads to the end without errors. However, the cluster eigenvalue solver results randomly in errors when forces or stresses are evaluated:

....

******************* MD=47 SCF=21 *******************
<Poisson> Poisson's equation using FFT...
<Set_Hamiltonian> Hamiltonian matrix for VNA+dVH+Vxc...
<Cluster> Solving the eigenvalue problem...
1 C MulP 2.0727 2.0727 sum 4.1453
2 C MulP 1.9644 1.9644 sum 3.9288
3 C MulP 1.9699 1.9699 sum 3.9399
4 C MulP 1.9933 1.9933 sum 3.9866
5 C MulP 2.0001 2.0001 sum 4.0002
6 C MulP 2.0007 2.0007 sum 4.0014
7 C MulP 1.9988 1.9988 sum 3.9976
8 C MulP 2.0002 2.0002 sum 4.0004
9 C MulP 2.0002 2.0002 sum 4.0004
10 C MulP 1.9988 1.9988 sum 3.9976
11 C MulP 2.0007 2.0007 sum 4.0014
12 C MulP 2.0001 2.0001 sum 4.0002
13 C MulP 1.9933 1.9933 sum 3.9866
14 C MulP 1.9699 1.9699 sum 3.9399
15 C MulP 1.9644 1.9644 sum 3.9288
16 C MulP 2.0725 2.0725 sum 4.1450
Sum of MulP: up = 32.00000 down = 32.00000
total= 64.00000 ideal(neutral)= 64.00000
<DFT> Total Spin Moment (muB) = 0.000000000000
<DFT> Mixing_weight= 0.020000000000
<DFT> Uele = -43.156499924661 dUele = 0.000000304256
<DFT> NormRD = 0.000324828290 Criterion = 0.000001000000
<MD=47> Force calculation
Force calculation #1
[localhost.localdomain:26291] Read -1, expected 18446744073709548224, errno = 22
[localhost.localdomain:26291] Read -1, expected 18446744073709546432, errno = 22
[localhost.localdomain:26291] Read -1, expected 18446744073709546432, errno = 22
[localhost.localdomain:26291] Read -1, expected 18446744073709546432, errno = 22
[localhost.localdomain:26291] Read -1, expected 18446744073709547200, errno = 22
[localhost.localdomain:26291] Read -1, expected 18446744073709547200, errno = 22
[localhost.localdomain:26291] Read -1, expected 18446744073709547200, errno = 22
[localhost.localdomain:26291] Read -1, expected 18446744073709546432, errno = 22
[localhost.localdomain:26291] Read -1, expected 18446744073709546432, errno = 22
[localhost.localdomain:26291] Read -1, expected 18446744073709546432, errno = 22
[localhost.localdomain:26291] Read -1, expected 18446744073709546432, errno = 22
[localhost.localdomain:26291] Read -1, expected 18446744073709547200, errno = 22
[localhost.localdomain:26291] Read -1, expected 18446744073709547200, errno = 22
[localhost.localdomain:26291] Read -1, expected 18446744073709547200, errno = 22
[localhost.localdomain:26291] Read -1, expected 18446744073709547200, errno = 22

Best regards





メンテ
Page: [1]

Re: cluster & band calculations ( No.1 )
Date: 2017/06/11 09:30
Name: Kylin

Dear tata

before you start your calculation, have you ever tested the validation of compiled openmx by "-runtest" and "-forcetest" argument in the work directory? For my perspective, I didn't prefer to do a mixture compilation between gcc and Intel MKL. Maybe you can try icc+MKL, gcc+openblas or gcc+ACML, the later was shown to be more stable for openmx.

I just quickly review the code of DFT.c and Force.c, and find there would be no different for the force and stress calculation between Band and Cluster solver. the only special treatment was the linear solver for DC and Krylov in Force.c. Thus I think you should check the validation of your executable first, then the structure of your system after minimization. Sometimes, the impractical configuration may induce the instability of openmx.

Cheers
Kylin
メンテ
Re: cluster & band calculations ( No.2 )
Date: 2017/06/12 17:59
Name: tata

Dear Kylin,

Thank you for quick response. After some additional test performed under GNU+MKL environment I come to conclusion that evaluation of forces and stresses works well for band and cluster solvers when only the one openMP thread is run.

Best regards, tata
メンテ
Re: cluster & band calculations ( No.3 )
Date: 2017/06/18 18:41
Name: tata

Hi,

The errors appear when #1 contribution to forces and stresses is calculated:

<MD= 5> Force calculation
Force calculation #1
[s3:03456] *** Process received signal ***
[s3:03456] Signal: Segmentation fault (11)
[s3:03456] Signal code: Address not mapped (1)
[s3:03456] Failing at address: 0x19ceecd8
[s3:03450] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xf890)[0x7fa0bd567890]
[s3:03450] [ 1] /home/dft/opt/openmpi-2.1.1/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv_request_progress_frag+0x3e)[0x7fa0af5e5bee]
[s3:03450] [ 2] /home/dft/opt/openmpi-2.1.1/lib/openmpi/mca_btl_vader.so(mca_btl_vader_poll_handle_frag+0x71)[0x7fa0afbf7b51]
[s3:03450] [ 3] /home/dft/opt/openmpi-2.1.1/lib/openmpi/mca_btl_vader.so(+0x3f78)[0x7fa0afbf7f78]
[s3:03450] [ 4] /home/dft/opt/openmpi-2.1.1/lib/libopen-pal.so.20(opal_progress+0x3c[s3:03456] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xf890)[0x7f585bda7890]
[s3:03456] [ 1] /home/dft/opt/openmpi-2.1.1/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv_request_progress_frag+0x3e)[0x7f584dd94bee]
[s3:03456] [ 2] /home/dft/opt/openmpi-2.1.1/lib/openmpi/mca_btl_vader.so(mca_btl_vader_poll_handle_frag+0x71)[0x7f584e3a6b51]
[s3:03456] [ 3] /home/dft/opt/openmpi-2.1.1/lib/openmpi/mca_btl_vader.so(+0x3f78)[0x7f584e3a6f78]
[s3:03456] [ 4] /home/dft/opt/openmpi-2.1.1/lib/libopen-pal.so.20(opal_progress+0x3c)[0x7f585b4a3e2c]

Finally I come to conclusion that it is somehow related to the use of openmpi-2.1.1, and not connected to to compiling environment and/or math libraries. When I switched to mpich-3.2 the above errors "miraculously" disappeared and I am very contend that OpenMX (hybrid parallelization) works perfectly when cell optimization is enabled.

Best regards, tata
メンテ

Page: [1]

Title (must) Move the thread to the top
Your Name (must)
E-Mail (must)
URL
Password (used in modification of the submitted text)
Comment (must)

   Save Cookie