Re: MPI version: unequal load balance over multiple CPUs.... ( No.1 ) 
 Date: 2007/07/30 23:41
 Name: T.Ozaki
 Hi,
The load balance could depend on the number of atoms and the method for diagonalization. In general, it can be confirmed that as the number of atoms increases, the load balance becomes better.
Best regards,
TO

Re: MPI version: unequal load balance over multiple CPUs.... ( No.2 ) 
 Date: 2007/07/31 03:40
 Name: Vasilii Artyukhov
 Dear Dr. Ozaki,
I have a question about the parallelization in OpenMX. The manual page at
http://www.openmxsquare.org/openmx_man/node51.html
shows that for the 'band' scheme, the scaling is rather good at first, but then it suddenly leaps to what seems to be at least twice the 'ideal' value. I doubt that this makes sense, so could you please fix the figure.
I'd also like to come back to the issue of kpoints in OpenMX. While the fact that the code doesn't make any use of the basic symmetries (spatial reflections & the inversion) already makes the computational effort several times larger than it could be, the fact that the code uses Gammapoint centered grids leads to another two consequences.
First, you can't possibly reach the BZ boundaries with this scheme, and second, to get the Gamma point on the mesh, you have to use odd numbers of points (like the 3x3x3 grid you used for the benchmark test). The latter fact inevitably increases the overhead, since the highest overall efficiency on MPI systems is achieved for even numbers of processors.
So, getting to the point, could you please modify the code so that the grid is (or at least, can be) shifted by 0.5x0.5x0.5 grid spacings, and also, that the most basic spatial and time symmetry relations that hold always are taken into account?
While the most important aim of the code is the O(N) capability, this very simple and general thing can give the user an almost 8x speedup compared to what we have now. And having the possibility to efficiently calculate bulk systems is really important, for instance, when you have to compare the results of O(N) calculations of a nanostructure to the corresponding bulk material.
Best regards, Vasilii

Re: MPI version: unequal load balance over multiple CPUs.... ( No.3 ) 
 Date: 2007/08/01 00:29
 Name: T.Ozaki
 Dear Dr. Artyukhov,
> I have a question about the parallelization in OpenMX. The manual page at > http://www.openmxsquare.org/openmx_man/node51.html > shows that for the 'band' scheme, the scaling is rather good at first, but then it > suddenly leaps to what seems to be at least twice the 'ideal' value. I doubt that this > makes sense, so could you please fix the figure
The benchmark result is true. However, it should be mentioned that the algorithm is changed at the leaping point. In case the number of processors exceeds the number of kpoints in the half Brillouin zone (see the manual for more precise thing), eigenvectors calculated by the first diagonalization are stored, since the memory requirement is reduced. Otherwise, the first diagonalization stores only the eigenvalues used for finding a Fermi level, after finding the Fermi level, the eigenvectors are calculated by the second diagonalization to evaluate the density matrix with the Fermi level on the fly without storing the eigenvectors due to the memory issue.
For metallic systems, it is impossible to calculate the density matrix before finding the Fermi level, Thus, we need to store eigenvectors. But, storing all the eigenvectors with many kpoints requires a huge computational memory. So, instead of storing the eigenvectors, the second diagonalization is performed in case of a smaller number of processors. In case of the larger number of processors, it is possible to skip the second diagonalization, since the memory requirement is reduced.
This is the reason why such a sudden jump is found in the figure.
> I'd also like to come back to the issue of kpoints in OpenMX. While the fact that the > code doesn't make any use of the basic symmetries (spatial reflections & the > inversion) already makes the computational effort several times larger than it could > be, the fact that the code uses Gammapoint centered grids leads to another two > consequences. > So, getting to the point, could you please modify the code so that the grid is > (or at least, can be) shifted by 0.5x0.5x0.5 grid spacings, and also, that the most > basic spatial and time symmetry relations that hold always are taken into account?
Thank you for your suggestion. I will consider the modification on shifting of kpoints in the next release. But current OpenMX is using the inversion symmetry of the Brillouin zone. So, even if one gives the kpoints like 4x4x4=64, in this case 32 points among 64 kpoints are used in actual calculations (in collinear case).
Best regards,
TO

MPI: SGI altix ERROR in FFT3RC: this version does not support the required half grid mode ( No.4 ) 
 Date: 2008/01/19 18:52
 Name: sutapa <sutapa@physics.unipune.ernet.in>
 I am trying to run vasp band structure parallel version in SGI altix machine.While compliling the make file there is no error but while running i am recieving the following error.
MPI: altix: 0x348000046503775: ERROR in FFT3RC: this version does not support the required half grid mode MPI: altix: 0x348000046503775: 64 48 200 MPI: altix: 0x348000046503775: 64 48 101 MPI: could not run executable (case #4) ~
Please helpme out
Sutapa

