.. _cluster_notes:

Running jobs on a cluster with PBS
======================================


Portable Batch System
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

`Portable Batch System`_ (PBS) is a software devised to schedule and control jobs executed on a Linux cluster. PBS was originally developed for NASA by Veridian company in the late 1990s. Since then many versions emerged, including the OpenSource variant TORQUE-PBS, which has been installed on the Funk Linux cluster. The software package provides the following commands:

  - **qsub**

    the command ``qsub script.pbs`` submits a new job to the cluster; it's status can be monitored with ``qstat``; after submitting a job, the ``qsub`` command prints its unique ``job-id`` on the screen (e.g. ``2722238.fun-k``. The following command migh be helfull:

        - ``qsub -l nodes=host`` submits a job to a particular node, where ``host`` is the node name, e.g. ``qsub -l nodes=funk-node-11`` 

        - ``qsub -F param_val script.pbs`` passes a parameter value to a script while submitting, i.e. it corresponds to running ``./script.pbs param_val`` from a command line

  - **qstat**

    lists all jobs currently running and those still waiting in the queue; ``qstat -u jdoe`` shows only these belonging to ``jdoe``. A very detailed description of a job can be obtained with ``qstat -f job-id``

  - **qdel** 

    ``qdel job-id``  removes a given job from the queue, possibly terminating it; the job-id must match the output from ``qstat``; you can't delete jobs of other users.


Submission script
~~~~~~~~~~~~~~~~~~~

User can't simply submit a job directly to the cluster; a submission script has to be prepared for that purpose. An example is given below:


.. code-block:: bash
  
  #PBS -V
  #PBS -l walltime=480:00:00  # 480 hrs mean 20 days

  cd $PBS_O_WORKDIR
  pwd > ERR
  hostname >> ERR
  date >> ERR
  # ---- place the actuall job here
  ls > OUT

  date >> ERR


Helpfull commands
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


- killing **all** your own jobs:
  ``qstat -u dgront | grep batch| awk '{print $1}' | xargs qdel``

- listing jobs by a user:

   ``qstat | awk '{print $3}' | sort | uniq -c | sort -nr``

   Example output:


.. code-block:: bash
  
   3707 jeziorsk
   1041 awozniak
     15 lcbio
      8 aswiatek
      6 vjewtoukoff
      4 januszc
      3 rszosz
      3 piwczyn
      1 andgrz


Case study A: multiple jobs with parameters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Typically one needs to submit mutliple copies for the same program, each with a different set of input parameters. 
Of course this can be solved by preparing each of these jobs manually. However, there is a high risk of a simple *human error*. In the ideal situation, there should be just a single script that will be run multiple times, each time automatically provided with the correct parameters.

.. rubric::
  Solution 1: folder name bears parameter values

In this approach you create a single folder for a given combination of parameters. For example, the ``argon`` program of the BioShell package can be run as follows:  

.. code-block:: bash
  
  ./argon -n 343 -d 0.5 -t 90


This will run a Monte Carlo simulation of argon gas with 343 atoms at density 0.5 at temperature 90[K]. In order to run simumation for densities 0.4, 0.5 and 0.6 at temperatures 84, 86, 88, 90, 92 and 94 [K], you can create 24 folders named ``temperature-density``, e.g. ``0.5-90`` to simulate density 0.5 at temperature 90[K]. The script will automatically grab the parameters from a folder name:


.. code-block:: bash
  
  TEMPER=`pwd | tr '/-' ' ' | awk '{print $(NF)}'`
  DENSITY=`pwd | tr '/-' ' ' | awk '{print $(NF-1)}'`
  ./argon -n 343 -d $DENSITY -t $TEMPER

You can easily create the required folders with a simple ``bash`` command:

.. code-block:: bash
  
  for D in 0.5 0.6 0.4; do
    for T in 84 86 88 90 92 94; do
      mkdir $D"-"$T"
    done
  done

**Note** that the ``qsub`` command has to be executed while being in the ``0.5-90`` folder. You can keep the submission script in the main folder (i.e. on the same level as ``0.5-90`` folder), and submit all the jobs with a single command:


.. code-block:: bash
  
  for i in 0.?-??; do cd $i; qsub ../my_script.pbs; cd ../; done


.. rubric::
  Solution 2: generated input files

The solution above is quite simple; its main advantage is that it's very obvious what parameter values were used to generate given outut file. This approach however may not work in more complex situations. In a more general approach  one creates input files - one per folder.


Case study B: lots of short jobs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In another scenario a given program runs quite fast but there is *thousands* of jobs to compute. Sending them all to the queue would be inefficient and even could restrain the queuing system. In such a case one can produce a file containing the respective commands - one per line - and then split the file. Let's say there is a ``process_file`` executable and lots of ``*.fasta`` files that have to be processed one by one. Then the following command:

.. code-block:: bash
  
  for i in *.fasta; do echo "process_file $i"; done > command_to_run.sh

will produce a kind of a *bash script* with all the commands to be executed, as a single job. Assuming there were 1000 ``*.fasta`` files, you end up with a file of 1000 lines. Split that file, e.g. ``split -l 100`` will produce 10 files named ``xa?``, 100 lines each. Then append the submission script header on the top of each of them and submit them as separate jobs.

.. code-block:: bash
  
  for i in x??; do 
      cat script_header.sh > $i.pbs
      cat $i >> $i.pbs
  done

  # --- double check the scripts are OK and then:
  for i in *.pbs; do qsub $i; done


.. rubric::
  References

.. target-notes::

.. _`Portable Batch System` : https://en.wikipedia.org/wiki/Portable_Batch_System