COCOS Cluster PBS primer
DESCRIPTION
In order to add your job to a queue one of two things it is needed.
Either read a qsub manual page and use it to send a job from command
line or write a script doing this - all the rest of this manual is
about the later solotion. Below find an example script adding program
called program to the queue.
#!/bin/bash
#PBS -N program_name
#PBS -l cput=10:20:30
#PBS -q queue_name
#PBS -m abe
cd $HOME/path_to_the_program/
how_to_run_your_program
Instead of program_name provide a name for this application. You make
it somewhat informative, at least for yourself as this name will appear
in the queue, but beware that this name cannot contain spaces nor other
special characters :) Instead of queue_name provide a name for a queue
to which the job should be added. Proper setting of this name is very
important as different queues have different properties, mainly execu‐
tion time. For example in a normal queue the job intended to run for
20h will not finish its execution as there is 12h execution time limit
for this queue. Full info about all configured queues can be found
running command line qstat -q or a bit less formal queues
Queue time limit is a CPU Time parameter in the table which will show
up when running qstat -q. Another important parameter is -l cput=h:m:s
It sets our estimated maximal time of execution with the following rep‐
resentation: h - time in hours, m - minutes, s - seconds. Each one
should rather appear, even if 0, as for example just 2 will mean only 2
seconds instead of anything else while 5:0 will mean 5 minutes 0 sec‐
onds which still might not be the intended thing. If this parameter is
ommited altogether then some system default will apply which will fin‐
ish the job very quickly indeed.
After setting all of the PBS starting parameters the server needs to
learn a way to run the application itself (later on be passed to the
executing node). To do so in our small example we cd to our applica‐
tion directory cd $HOME/path_to_the_program/ where instead
path_to_the_program a path to our application is set, realtive to our
$HOME dir. Next line how_to_run_your_program runs the application
itself (basically ./application_name or some variation of it with I/O
redirections and the like will do). Of course we can also run the
application from somewhere else (useful when running the same applica‐
tion for different data sets) but this is not the place to learn such
things :)
Additional parameter set in our example -m abe is responsible for send‐
ing an e-mail in case of: starting and ending of the job or errors
within it. More on all such topics in man qsub.
EXAMPLE
User ukasz wants to enqueue a program being in /home/ukasz/temp. the
program name is p_npt. To do so the user creates in the same directory
a script run_it.sh. In the console she/he writes:
ukasz@shiva> cd /home/ukasz/temp
ukasz@shiva> touch run_it.sh
ukasz@shiva> mcedit run_it.sh
Now opens an mc editor window (of course one can use whatever editor
he/she likes :) The inputed content of this file is like this:
#!/bin/sh
#PBS -N p_npt
#PBS -q long
#PBS -l cput=40:00:00
#PBS -m abe
cd $HOME/temp
./p_npt
This is basically all what is needed, do not to forget to write the
file though (in that mcedit it is F2 key :). Now the job can be added
to the queue. It can be done by a console command:
ukasz@shiva> qsub run_it.sh
And that’s all folks. Now we can happily check if it is already in the
queue with another console command:
ukasz@shiva> qstat
And we look there for a job with the name given in our script. If it is
there then all went well, if not then something went wrong and we need
to: read this manual back looking for clues. If we cannot manage by
ourselves and using man qsub we can ask for help at
root@shiva.if.uj.edu.pl - given time and will (not to mention some
detailed description of the problem and/or direction where to look for
all of it) one might even get the help :)
Of course one can always delete a job from a queue (this will not touch
any user files, only the system info about the job) using qdel job_no
where job_no is a job number gotten from the queue (with qstat com‐
mand).
MPI
Althought LAM is installed but it’s usage is STRONGLY discouraged for
the following reasons: - it does not work well with our PBS/torque due
to it’s own interprocess communication instead of usage of PBS-native
one. On top of that it is believed to be less computing effective than
it’s replacement: Open-MPI. - usage of MPI with more
cores/threads/processes than fit in a single node (4 at the moment in
most of our nodes) is possible but difficult as one needs to be able to
run a job (or more preciselly to be able to login via ssh without giv‐
ing password) from one node on another one - lamboot and the rest of
it. At the moment it is impossible (or is it?) without my (root) help
so when planning such jobs please come to me for help when you check
you cannot do it by yourself :) But be warned that you should not be
using it at all.
Open-MPI (newest version) is installed (both development and run-time)
in /usr/local tree. It uses native, PBS interprocess communication
mechanisms and what’s more it is reported by some to be more computing-
effective than LAM. To use it add /usr/local/include to your header
search path, /usr/local/lib to the libs path and either use orterun
instead of mpiexec/mpirun or otherwise make sure /usr/local/bin is in
front of your PATH. Open-MPI will auto-magically learn how many pro‐
cesses/threads it can run depending on the info got from PBS :)
We have defined separate queues for MPI jobs - such jobs will not RUN
outside those queues, even if not rejected on spot. For all the real
MPI problems our queues, even those dedicated to MPI, have a hard limit
of one node per one job which means that no job can run more threads
than there are cores on a single motherboard. It also favours using
openmp compiler extension over real open-mpi. If, for some or other
reason, it is not enough try negotiating but NOT with the cluster admin
but with an advisory board instead (PB will be good starting man) and
better be prepared for hard discussion.
SOME LESS GENERAL REMARKS
Normal, well behaving user should NOT ask for more cluster resources
than needed by the job (maybe except cput which will anyway not be used
if not needed :) but sometimes (specially when cluster is not full or
overcorwded) this attitude might be reconsidered profitably for all.
Such circumstances are for example:
- usage of full machine memory (four-cores have 8 gigs of RAM) - one
does not need suffer and make other suffering the same with swapping of
the jobs, no? Exclusice reserving whole machine for him-/her-self will
not be punishable in such case with cluster running part time only :)
- also running OpenMPI with 4 cores occupied by 4 threads calls for
reserving whole single machine for the job only :)
Try to suggest some other sensible reasons and I will indicated them
here as well :)
SEE ALSO
qsub - everything what can be in the script isubmitted to PBS , some of
it may be inactive, see the man page for details
qdel - how to remove the job from a queue
qstat - how to use server/queue statistics and other PBS info
qmgr - PBS configuration reader, queues etc.
pestat - external to PBS script showing similar info in different way,
basically grouped according to hosts/nodes not other parameters
http://shiva.if.uj.edu.pl - only from allowed hosts, shows nodes and
jobs statistics on a web page