Tasks are not shared between nodes.

3 messages Options
Embed this post
Permalink
gia.giotti

Tasks are not shared between nodes.

Reply Threaded More More options
Print post
Permalink
Hello everyone,
am completely new to parallel computing and to openMPI. This is my first post.
I'm trying to set up a cluster of 5 PC (pentium4) with PelicanHPC-v1.9_32bit:
I follow the tutorial (http://pareto.uab.es/mcreel/PelicanHPC/Tutorial/PelicanTutorial.html) but, before pelican_setup, I set eth0 (ifconfig eth0 inet 10.11.12.1) and executed lamboot in order to achieve a correct(?) lamnodes output looking similar to this:
n0     10.11.12.2:1:
n1     10.11.12.3:1:
n2     10.11.12.4:1:
n3     10.11.12.4:1:
n4     10.11.12.1:1:origin,this_node

I've written a typical mpi helloworld.90 program, compiled it with
mpif90.openmpi  helloworld.90 -o helloworld
and run it with both
mpirun.openmpi -np 4 helloworld
or
mpiexec.openmpi -np 4 helloworld
and noticed (with top and looking to the program output) that all the 4 tasks were running on the master while the 4 nodes are  not working.
How can I achieve the desired migration of tasks?

Another question: I'm able to rsh to the nodes only if the user password is empty so that I cannot ssh to the second interface (eth1) of the master. Is there a way to resolve this problem?  
Michael Creel

Re: Tasks are not shared between nodes.

Reply Threaded More More options
Print post
Permalink
gia.giotti wrote:
I follow the tutorial (http://pareto.uab.es/mcreel/PelicanHPC/Tutorial/PelicanTutorial.html) but, before pelican_setup, I set eth0 (ifconfig eth0 inet 10.11.12.1) and executed lamboot in order to achieve a correct(?) lamnodes output looking similar to this:
n0     10.11.12.2:1:
n1     10.11.12.3:1:
n2     10.11.12.4:1:
n3     10.11.12.4:1:
n4     10.11.12.1:1:origin,this_node
I don't understand this. If you lamboot before you run pelican_setup, then the compute nodes are not running yet, no? How do you get lamnodes to show this output? lambooting when only the frontend is running should give you something like the last line of your output. Why don't you follow the steps of the Tutorial - are you trying to achieve some specialized configuration?

gia.giotti wrote:
I've written a typical mpi helloworld.90 program, compiled it with
mpif90.openmpi  helloworld.90 -o helloworld
and run it with both
mpirun.openmpi -np 4 helloworld
or
mpiexec.openmpi -np 4 helloworld
and noticed (with top and looking to the program output) that all the 4 tasks were running on the master while the 4 nodes are  not working.
How can I achieve the desired migration of tasks?
What you see is normal and correct, considering the way you run this. You need to specify as hostfile that tells OpenMPI which machines to use. See the README_PELICAN file in the HPL directory for an example of how to use OpenMPI, or do "man mpirun.openmpi". The -host or -hostfile switches are what you need.

gia.giotti wrote:
Another question: I'm able to rsh to the nodes only if the user password is empty so that I cannot ssh to the second interface (eth1) of the master. Is there a way to resolve this problem?  
You should have passwordless ssh to all nodes, even if a password is set. I am guessing that something you did by trying to run things before calling pelican_setup has caused this. Please give the steps outlined in the Tutorial a try and let me know if this persists.

gia.giotti

Re: Tasks are not shared between nodes.

Reply Threaded More More options
Print post
Permalink
Thank you very much Michael, you are completely right:

Michael Creel wrote:
gia.giotti wrote:
I follow the tutorial (http://pareto.uab.es/mcreel/PelicanHPC/Tutorial/PelicanTutorial.html) but, before pelican_setup, I set eth0 (ifconfig eth0 inet 10.11.12.1) and executed lamboot in order to achieve a correct(?) lamnodes output looking similar to this:
n0     10.11.12.2:1:
n1     10.11.12.3:1:
n2     10.11.12.4:1:
n3     10.11.12.4:1:
n4     10.11.12.1:1:origin,this_node
I don't understand this. If you lamboot before you run pelican_setup, then the compute nodes are not running yet, no? How do you get lamnodes to show this output? lambooting when only the frontend is running should give you something like the last line of your output. Why don't you follow the steps of the Tutorial - are you trying to achieve some specialized configuration?
Here my error was that I assigned a static IP to eth0...and the other errors followed in the hope to make the system working :-(
Michael Creel wrote:
gia.giotti wrote:
I've written a typical mpi helloworld.90 program, compiled it with
mpif90.openmpi  helloworld.90 -o helloworld
and run it with both
mpirun.openmpi -np 4 helloworld
or
mpiexec.openmpi -np 4 helloworld
and noticed (with top and looking to the program output) that all the 4 tasks were running on the master while the 4 nodes are  not working.
How can I achieve the desired migration of tasks?
What you see is normal and correct, considering the way you run this. You need to specify as hostfile that tells OpenMPI which machines to use. See the README_PELICAN file in the HPL directory for an example of how to use OpenMPI, or do "man mpirun.openmpi". The -host or -hostfile switches are what you need.
I misunderstood some short guide that I read on the subject somewhere in the net.
 
Michael Creel wrote:
gia.giotti wrote:
Another question: I'm able to rsh to the nodes only if the user password is empty so that I cannot ssh to the second interface (eth1) of the master. Is there a way to resolve this problem?  
You should have passwordless ssh to all nodes, even if a password is set. I am guessing that something you did by trying to run things before calling pelican_setup has caused this. Please give the steps outlined in the Tutorial a try and let me know if this persists.
This too were due to the static IP of the internal interface that was changed, after a certain amount of time, by the DHCP switch.

Thank you very much!