Running jobs on a cluster with PBS¶
Portable Batch System¶
Portable Batch System 1 (PBS) is a software devised to schedule and control jobs executed on a Linux cluster. PBS was originally developed for NASA by Veridian company in the late 1990s. Since then many versions emerged, including the OpenSource variant TORQUE-PBS, which has been installed on the Funk Linux cluster. The software package provides the following commands:
qsub
the command
qsub script.pbs
submits a new job to the cluster; it’s status can be monitored withqstat
; after submitting a job, theqsub
command prints its uniquejob-id
on the screen (e.g.2722238.fun-k
. The following command migh be helfull:
qsub -l nodes=host
submits a job to a particular node, wherehost
is the node name, e.g.qsub -l nodes=funk-node-11
qsub -F param_val script.pbs
passes a parameter value to a script while submitting, i.e. it corresponds to running./script.pbs param_val
from a command lineqstat
lists all jobs currently running and those still waiting in the queue;
qstat -u jdoe
shows only these belonging tojdoe
. A very detailed description of a job can be obtained withqstat -f job-id
qdel
qdel job-id
removes a given job from the queue, possibly terminating it; the job-id must match the output fromqstat
; you can’t delete jobs of other users.
Submission script¶
User can’t simply submit a job directly to the cluster; a submission script has to be prepared for that purpose. An example is given below:
#PBS -V
#PBS -l walltime=480:00:00 # 480 hrs mean 20 days
cd $PBS_O_WORKDIR
pwd > ERR
hostname >> ERR
date >> ERR
# ---- place the actuall job here
ls > OUT
date >> ERR
Helpfull commands¶
killing all your own jobs:
qstat -u dgront | grep batch| awk '{print $1}' | xargs qdel
listing jobs by a user:
qstat | awk '{print $3}' | sort | uniq -c | sort -nr
Example output:
3707 jeziorsk
1041 awozniak
15 lcbio
8 aswiatek
6 vjewtoukoff
4 januszc
3 rszosz
3 piwczyn
1 andgrz
Case study A: multiple jobs with parameters¶
Typically one needs to submit mutliple copies for the same program, each with a different set of input parameters. Of course this can be solved by preparing each of these jobs manually. However, there is a high risk of a simple human error. In the ideal situation, there should be just a single script that will be run multiple times, each time automatically provided with the correct parameters.
Solution 1: folder name bears parameter values
In this approach you create a single folder for a given combination of parameters. For example, the argon
program of the BioShell package can be run as follows:
./argon -n 343 -d 0.5 -t 90
This will run a Monte Carlo simulation of argon gas with 343 atoms at density 0.5 at temperature 90[K]. In order to run simumation for densities 0.4, 0.5 and 0.6 at temperatures 84, 86, 88, 90, 92 and 94 [K], you can create 24 folders named temperature-density
, e.g. 0.5-90
to simulate density 0.5 at temperature 90[K]. The script will automatically grab the parameters from a folder name:
TEMPER=`pwd | tr '/-' ' ' | awk '{print $(NF)}'`
DENSITY=`pwd | tr '/-' ' ' | awk '{print $(NF-1)}'`
./argon -n 343 -d $DENSITY -t $TEMPER
You can easily create the required folders with a simple bash
command:
for D in 0.5 0.6 0.4; do
for T in 84 86 88 90 92 94; do
mkdir $D"-"$T"
done
done
Note that the qsub
command has to be executed while being in the 0.5-90
folder. You can keep the submission script in the main folder (i.e. on the same level as 0.5-90
folder), and submit all the jobs with a single command:
for i in 0.?-??; do cd $i; qsub ../my_script.pbs; cd ../; done
Solution 2: generated input files
The solution above is quite simple; its main advantage is that it’s very obvious what parameter values were used to generate given outut file. This approach however may not work in more complex situations. In a more general approach one creates input files - one per folder.
Case study B: lots of short jobs¶
In another scenario a given program runs quite fast but there is thousands of jobs to compute. Sending them all to the queue would be inefficient and even could restrain the queuing system. In such a case one can produce a file containing the respective commands - one per line - and then split the file. Let’s say there is a process_file
executable and lots of *.fasta
files that have to be processed one by one. Then the following command:
for i in *.fasta; do echo "process_file $i"; done > command_to_run.sh
will produce a kind of a bash script with all the commands to be executed, as a single job. Assuming there were 1000 *.fasta
files, you end up with a file of 1000 lines. Split that file, e.g. split -l 100
will produce 10 files named xa?
, 100 lines each. Then append the submission script header on the top of each of them and submit them as separate jobs.
for i in x??; do
cat script_header.sh > $i.pbs
cat $i >> $i.pbs
done
# --- double check the scripts are OK and then:
for i in *.pbs; do qsub $i; done
References