.. _cluster_notes: Running jobs on a cluster with PBS ====================================== Portable Batch System ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ `Portable Batch System`_ (PBS) is a software devised to schedule and control jobs executed on a Linux cluster. PBS was originally developed for NASA by Veridian company in the late 1990s. Since then many versions emerged, including the OpenSource variant TORQUE-PBS, which has been installed on the Funk Linux cluster. The software package provides the following commands: - **qsub** the command ``qsub script.pbs`` submits a new job to the cluster; it's status can be monitored with ``qstat``; after submitting a job, the ``qsub`` command prints its unique ``job-id`` on the screen (e.g. ``2722238.fun-k``. The following command migh be helfull: - ``qsub -l nodes=host`` submits a job to a particular node, where ``host`` is the node name, e.g. ``qsub -l nodes=funk-node-11`` - ``qsub -F param_val script.pbs`` passes a parameter value to a script while submitting, i.e. it corresponds to running ``./script.pbs param_val`` from a command line - **qstat** lists all jobs currently running and those still waiting in the queue; ``qstat -u jdoe`` shows only these belonging to ``jdoe``. A very detailed description of a job can be obtained with ``qstat -f job-id`` - **qdel** ``qdel job-id`` removes a given job from the queue, possibly terminating it; the job-id must match the output from ``qstat``; you can't delete jobs of other users. Submission script ~~~~~~~~~~~~~~~~~~~ User can't simply submit a job directly to the cluster; a submission script has to be prepared for that purpose. An example is given below: .. code-block:: bash #PBS -V #PBS -l walltime=480:00:00 # 480 hrs mean 20 days cd $PBS_O_WORKDIR pwd > ERR hostname >> ERR date >> ERR # ---- place the actuall job here ls > OUT date >> ERR Helpfull commands ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - killing **all** your own jobs: ``qstat -u dgront | grep batch| awk '{print $1}' | xargs qdel`` - listing jobs by a user: ``qstat | awk '{print $3}' | sort | uniq -c | sort -nr`` Example output: .. code-block:: bash 3707 jeziorsk 1041 awozniak 15 lcbio 8 aswiatek 6 vjewtoukoff 4 januszc 3 rszosz 3 piwczyn 1 andgrz Case study A: multiple jobs with parameters ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Typically one needs to submit mutliple copies for the same program, each with a different set of input parameters. Of course this can be solved by preparing each of these jobs manually. However, there is a high risk of a simple *human error*. In the ideal situation, there should be just a single script that will be run multiple times, each time automatically provided with the correct parameters. .. rubric:: Solution 1: folder name bears parameter values In this approach you create a single folder for a given combination of parameters. For example, the ``argon`` program of the BioShell package can be run as follows: .. code-block:: bash ./argon -n 343 -d 0.5 -t 90 This will run a Monte Carlo simulation of argon gas with 343 atoms at density 0.5 at temperature 90[K]. In order to run simumation for densities 0.4, 0.5 and 0.6 at temperatures 84, 86, 88, 90, 92 and 94 [K], you can create 24 folders named ``temperature-density``, e.g. ``0.5-90`` to simulate density 0.5 at temperature 90[K]. The script will automatically grab the parameters from a folder name: .. code-block:: bash TEMPER=`pwd | tr '/-' ' ' | awk '{print $(NF)}'` DENSITY=`pwd | tr '/-' ' ' | awk '{print $(NF-1)}'` ./argon -n 343 -d $DENSITY -t $TEMPER You can easily create the required folders with a simple ``bash`` command: .. code-block:: bash for D in 0.5 0.6 0.4; do for T in 84 86 88 90 92 94; do mkdir $D"-"$T" done done **Note** that the ``qsub`` command has to be executed while being in the ``0.5-90`` folder. You can keep the submission script in the main folder (i.e. on the same level as ``0.5-90`` folder), and submit all the jobs with a single command: .. code-block:: bash for i in 0.?-??; do cd $i; qsub ../my_script.pbs; cd ../; done .. rubric:: Solution 2: generated input files The solution above is quite simple; its main advantage is that it's very obvious what parameter values were used to generate given outut file. This approach however may not work in more complex situations. In a more general approach one creates input files - one per folder. Case study B: lots of short jobs ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In another scenario a given program runs quite fast but there is *thousands* of jobs to compute. Sending them all to the queue would be inefficient and even could restrain the queuing system. In such a case one can produce a file containing the respective commands - one per line - and then split the file. Let's say there is a ``process_file`` executable and lots of ``*.fasta`` files that have to be processed one by one. Then the following command: .. code-block:: bash for i in *.fasta; do echo "process_file $i"; done > command_to_run.sh will produce a kind of a *bash script* with all the commands to be executed, as a single job. Assuming there were 1000 ``*.fasta`` files, you end up with a file of 1000 lines. Split that file, e.g. ``split -l 100`` will produce 10 files named ``xa?``, 100 lines each. Then append the submission script header on the top of each of them and submit them as separate jobs. .. code-block:: bash for i in x??; do cat script_header.sh > $i.pbs cat $i >> $i.pbs done # --- double check the scripts are OK and then: for i in *.pbs; do qsub $i; done .. rubric:: References .. target-notes:: .. _`Portable Batch System` : https://en.wikipedia.org/wiki/Portable_Batch_System