Running jobs on a cluster with PBS

Portable Batch System

Portable Batch System 1 (PBS) is a software devised to schedule and control jobs executed on a Linux cluster. PBS was originally developed for NASA by Veridian company in the late 1990s. Since then many versions emerged, including the OpenSource variant TORQUE-PBS, which has been installed on the Funk Linux cluster. The software package provides the following commands:

  • qsub

    the command qsub script.pbs submits a new job to the cluster; it’s status can be monitored with qstat; after submitting a job, the qsub command prints its unique job-id on the screen (e.g. 2722238.fun-k. The following command migh be helfull:

    • qsub -l nodes=host submits a job to a particular node, where host is the node name, e.g. qsub -l nodes=funk-node-11

    • qsub -F param_val script.pbs passes a parameter value to a script while submitting, i.e. it corresponds to running ./script.pbs param_val from a command line

  • qstat

    lists all jobs currently running and those still waiting in the queue; qstat -u jdoe shows only these belonging to jdoe. A very detailed description of a job can be obtained with qstat -f job-id

  • qdel

    qdel job-id removes a given job from the queue, possibly terminating it; the job-id must match the output from qstat; you can’t delete jobs of other users.

Submission script

User can’t simply submit a job directly to the cluster; a submission script has to be prepared for that purpose. An example is given below:

#PBS -V
#PBS -l walltime=480:00:00  # 480 hrs mean 20 days

cd $PBS_O_WORKDIR
pwd > ERR
hostname >> ERR
date >> ERR
# ---- place the actuall job here
ls > OUT

date >> ERR

Helpfull commands

  • killing all your own jobs: qstat -u dgront | grep batch| awk '{print $1}' | xargs qdel

  • listing jobs by a user:

    qstat | awk '{print $3}' | sort | uniq -c | sort -nr

    Example output:

3707 jeziorsk
1041 awozniak
  15 lcbio
   8 aswiatek
   6 vjewtoukoff
   4 januszc
   3 rszosz
   3 piwczyn
   1 andgrz

Case study A: multiple jobs with parameters

Typically one needs to submit mutliple copies for the same program, each with a different set of input parameters. Of course this can be solved by preparing each of these jobs manually. However, there is a high risk of a simple human error. In the ideal situation, there should be just a single script that will be run multiple times, each time automatically provided with the correct parameters.

Solution 1: folder name bears parameter values

In this approach you create a single folder for a given combination of parameters. For example, the argon program of the BioShell package can be run as follows:

./argon -n 343 -d 0.5 -t 90

This will run a Monte Carlo simulation of argon gas with 343 atoms at density 0.5 at temperature 90[K]. In order to run simumation for densities 0.4, 0.5 and 0.6 at temperatures 84, 86, 88, 90, 92 and 94 [K], you can create 24 folders named temperature-density, e.g. 0.5-90 to simulate density 0.5 at temperature 90[K]. The script will automatically grab the parameters from a folder name:

TEMPER=`pwd | tr '/-' ' ' | awk '{print $(NF)}'`
DENSITY=`pwd | tr '/-' ' ' | awk '{print $(NF-1)}'`
./argon -n 343 -d $DENSITY -t $TEMPER

You can easily create the required folders with a simple bash command:

for D in 0.5 0.6 0.4; do
  for T in 84 86 88 90 92 94; do
    mkdir $D"-"$T"
  done
done

Note that the qsub command has to be executed while being in the 0.5-90 folder. You can keep the submission script in the main folder (i.e. on the same level as 0.5-90 folder), and submit all the jobs with a single command:

for i in 0.?-??; do cd $i; qsub ../my_script.pbs; cd ../; done

Solution 2: generated input files

The solution above is quite simple; its main advantage is that it’s very obvious what parameter values were used to generate given outut file. This approach however may not work in more complex situations. In a more general approach one creates input files - one per folder.

Case study B: lots of short jobs

In another scenario a given program runs quite fast but there is thousands of jobs to compute. Sending them all to the queue would be inefficient and even could restrain the queuing system. In such a case one can produce a file containing the respective commands - one per line - and then split the file. Let’s say there is a process_file executable and lots of *.fasta files that have to be processed one by one. Then the following command:

for i in *.fasta; do echo "process_file $i"; done > command_to_run.sh

will produce a kind of a bash script with all the commands to be executed, as a single job. Assuming there were 1000 *.fasta files, you end up with a file of 1000 lines. Split that file, e.g. split -l 100 will produce 10 files named xa?, 100 lines each. Then append the submission script header on the top of each of them and submit them as separate jobs.

for i in x??; do
    cat script_header.sh > $i.pbs
    cat $i >> $i.pbs
done

# --- double check the scripts are OK and then:
for i in *.pbs; do qsub $i; done

References

1

https://en.wikipedia.org/wiki/Portable_Batch_System