SPMD Users Guide
Version 1.0
Introduction.
SPMD consists of several commands that allow the user to use the same parallel program initiation interface for PVM, LAM MPI, or MPICH programs. This interface was developed for the HIVE (http://newton.gsfc.nasa.gov/thehive). The HIVE consists, a 64 dual processor cluster of PCs and 2 host nodes. All users using SPMD start a parallel program from a host and SPMD distributes them across the nodes of the cluster. This document gives the step necessary to install SPMD.
Setting up the User environment.
Put the following in "$HOME/.bashrc":
| SPMDHOME=/usr/local/spmd | |
| export SPMDHOME | |
| . $SPMDHOME/bashrc |
where: SPMDHOME is the directory where 'spmd' was installed.
Also, on theHIVE, execute "touch .nopvm" in HOME directory on the HOST and all compute nodes. This turns off standard PVM.
Setting up a cluster.
Setting up a cluster is as simple as making a list of their names in a file called "nodelist". The first node of the list in the current host node. It can be designated by its actual name or the default name "localhost". For example:
| localhost | |
| bee000 | |
| bee001 | |
| bee002 | |
| bee003 |
This would be a cluster consisting of one host "localhost", and 4 computer nodes. A default cluster may be defined for a host, a user, or the current directory. In the current directory, it is called "nodelist". The host nodelist is "/etc/nodelist", and a users default nodelist is "$HOME/.nodelist".
Initiating a parallel program with SPMD.
First, write and compile an MPI or PVM program. This document will not discuss how to write a parallel program. To start the PVM program, "mandel", one would use the following command:
| spmd _ pvm mandel -c |
This will, be default, start one copy of "mandel" on each compute node listed in the nodelist file, but not on the host node (the first node listed in the nodelist). Any arguments required by the parallel program must follow the programs name on the command line ( in this case "-c" ). An initiated copy of a program will also be referred to as a process.
The user can also specify options to the spmd_xxx command, where xxx may be pvm, lam, or mpich. They must be between "spmd_xxx" and the programs name. These options consist of specifying a subset of nodes in the nodelist to run the program on(-s and -n options), how many copies(processes) of the program to start(-t option), how many consecutive processes to start per node(-p option) and give a group name(-g option) to the processes of the program (by default, the group name is the name of the program). An SPMD shell command line has the form:
| spmd_xxx [<spmd options>] program_name [<program arguments>] | |||
| <spmd options> : | |||
| t <n> | Start <n> copies of the program. | ||
| s <n> | Use node <n> as the first compute node. | ||
| n <n> | Use at most <n> consecutive nodes. | ||
| p <n> | Start <n> consecutive copies per node. | ||
| g <name> | Group name. | ||
| v | Produce verbose output, if any. | ||
As a special case, a process may be started on the host by specifying the first node of the group to be -1 by using the option "-s -1". This will start only one copy of the program, and it will be initiated on the host node.
There are other options for spmd_xxx that apply to a specific message passing environments. There can only be one such option on the command line and it may not include a program name:
| spmd_xxx [<option>] | |||
| <option> : | |||
| c | Display configuration and status information. | ||
| k | Kill all user tasks. | ||
| r | Restart server daemon. | ||
| h | Halt server daemon. | ||
Initiating multiple parallel programs with SPMD.
If multiple groups of programs need to be initiated, the user could initiate each group with an independent shell command line. However, processes of different groups would not be able to communicate with each other. Thus, SPMD can initiate multiple groups of processes with a single shell command line which allows processes of different groups to communicate with each other.
If the program name is the name of a non-executable file, the file is assumed to be a program descriptor file. A program descriptor file is a text file which contains multiple spmd command lines. Each command line will initiate a group of processes as though they had been initiated by a shell command line, except that process of different initiated groups can communicate with each other. These command lines have the following form:
| [<spmd options>] program_name [<program arguments>] ; [# <comment> ] | |||
| where: | |||
| <spmd options> | spmd options for that specific group | ||
| program_name | the name of the program that will be initiated | ||
| <program arguments> | arguments to be passed on to the process | ||
| <comment> | simply a comment | ||
A command may span multiple text lines. Every command must be terminated by a ;. A comment starts at a # sign and continues to the end of the text line. Comments are deleted from the descriptor file before it is processed.
SPMD options on the shell command line become the default SPMD options for commands in the program descriptor file. Program arguments in the shell command are appended to the program arguments contained in the program descriptor file before being passed on to their respective processes.