This thread is locked.Only browsing is available.
Top Page > Browsing
MPI version: unequal load balance over multiple CPUs....
Date: 2007/07/25 11:32
Name: Rob


Hello,

I have openmx compiled with OpenMPI-1.2.3 and GCC-4.1.2.

When I start a job with several CPUs, the load balance
over the CPUs is never properly balanced.

For example, with 4 CPUs, I have 2 CPUs actively running
(nearly 100% busy) and the other 2 CPUs mostly in sleeping
state (<1% busy).

With 8 CPUs, I have 3 CPUs actively running and 5 CPUs
mostly in the sleeping state.

----
For comparison:
I also use the VASP software with exactly the same
supporting software (OpenMPI & GCC); and the load balance
is almost equally distributed over all CPUs.
----

Hence, I wonder whether there is a problem in the MPI part
of the OpenMX code to distribute the work properly over
the available resources.

Best regards,
Rob.
メンテ
Page: [1]

Re: MPI version: unequal load balance over multiple CPUs.... ( No.1 )
Date: 2007/07/30 23:41
Name: T.Ozaki

Hi,

The load balance could depend on the number of atoms and the method
for diagonalization. In general, it can be confirmed that as the number
of atoms increases, the load balance becomes better.

Best regards,

TO
メンテ
Re: MPI version: unequal load balance over multiple CPUs.... ( No.2 )
Date: 2007/07/31 03:40
Name: Vasilii Artyukhov

Dear Dr. Ozaki,

I have a question about the parallelization in OpenMX. The manual page at

http://www.openmx-square.org/openmx_man/node51.html

shows that for the 'band' scheme, the scaling is rather good at first, but then it suddenly leaps to what seems to be at least twice the 'ideal' value. I doubt that this makes sense, so could you please fix the figure.

I'd also like to come back to the issue of k-points in OpenMX. While the fact that the code doesn't make any use of the basic symmetries (spatial reflections & the inversion) already makes the computational effort several times larger than it could be, the fact that the code uses Gamma-point centered grids leads to another two consequences.

First, you can't possibly reach the BZ boundaries with this scheme, and second, to get the Gamma point on the mesh, you have to use odd numbers of points (like the 3x3x3 grid you used for the benchmark test). The latter fact inevitably increases the overhead, since the highest overall efficiency on MPI systems is achieved for even numbers of processors.

So, getting to the point, could you please modify the code so that the grid is (or at least, can be) shifted by 0.5x0.5x0.5 grid spacings, and also, that the most basic spatial and time symmetry relations that hold always are taken into account?

While the most important aim of the code is the O(N) capability, this very simple and general thing can give the user an almost 8x speedup compared to what we have now. And having the possibility to efficiently calculate bulk systems is really important, for instance, when you have to compare the results of O(N) calculations of a nanostructure to the corresponding bulk material.

Best regards,
Vasilii
メンテ
Re: MPI version: unequal load balance over multiple CPUs.... ( No.3 )
Date: 2007/08/01 00:29
Name: T.Ozaki

Dear Dr. Artyukhov,

> I have a question about the parallelization in OpenMX. The manual page at
> http://www.openmx-square.org/openmx_man/node51.html
> shows that for the 'band' scheme, the scaling is rather good at first, but then it
> suddenly leaps to what seems to be at least twice the 'ideal' value. I doubt that this
> makes sense, so could you please fix the figure

The benchmark result is true. However, it should be mentioned that the algorithm
is changed at the leaping point. In case the number of processors exceeds the number
of k-points in the half Brillouin zone (see the manual for more precise thing),
eigenvectors calculated by the first diagonalization are stored, since the memory
requirement is reduced. Otherwise, the first diagonalization stores only the eigenvalues
used for finding a Fermi level, after finding the Fermi level, the eigenvectors
are calculated by the second diagonalization to evaluate the density matrix with
the Fermi level on the fly without storing the eigenvectors due to the memory issue.

For metallic systems, it is impossible to calculate the density matrix before finding
the Fermi level, Thus, we need to store eigenvectors. But, storing all the eigenvectors
with many k-points requires a huge computational memory. So, instead of storing the
eigenvectors, the second diagonalization is performed in case of a smaller number of
processors. In case of the larger number of processors, it is possible to skip the
second diagonalization, since the memory requirement is reduced.

This is the reason why such a sudden jump is found in the figure.

> I'd also like to come back to the issue of k-points in OpenMX. While the fact that the
> code doesn't make any use of the basic symmetries (spatial reflections & the
> inversion) already makes the computational effort several times larger than it could
> be, the fact that the code uses Gamma-point centered grids leads to another two
> consequences.
> So, getting to the point, could you please modify the code so that the grid is
> (or at least, can be) shifted by 0.5x0.5x0.5 grid spacings, and also, that the most
> basic spatial and time symmetry relations that hold always are taken into account?

Thank you for your suggestion.
I will consider the modification on shifting of k-points in the next release.
But current OpenMX is using the inversion symmetry of the Brillouin zone.
So, even if one gives the k-points like 4x4x4=64, in this case 32 points among
64 k-points are used in actual calculations (in collinear case).

Best regards,

TO


メンテ
MPI: SGI altix ERROR in FFT3RC: this version does not support the required half grid mode ( No.4 )
Date: 2008/01/19 18:52
Name: sutapa  <sutapa@physics.unipune.ernet.in>

I am trying to run vasp band structure parallel version in SGI altix machine.While compliling the make file there is no error but while running i am recieving the following error.

MPI: altix: 0x348000046503775: ERROR in FFT3RC: this version does not support the required half grid mode
MPI: altix: 0x348000046503775: 64 48 200
MPI: altix: 0x348000046503775: 64 48 101
MPI: could not run executable (case #4)
~

Please helpme out

Sutapa
メンテ

Page: [1]