.. _doc_fragment_picking:

How to pick fragments for modeling with Rosetta 
===============================================

Fragment picking process can be divided into three stages:

1. running **psiblast**

2. running secondary structure predictors

3. actual fragment picking

1. Running **psiblast**
~~~~~~~~~~~~~~~~~~~~~~~

In this step **psiblast** is used to find  sequences homologus to the query and to build a sequence profile.
Results from **psiblast** will be later used by secondary structure predictors and by the fragment picker itself. 
We use the following parameters to run **psiblast** program of the new **blast+** package (typically located in blast+/bin/ directory):

    .. code-block:: bash

      psiblast -num_iterations 5 \
      -num_alignments 100000 \
      -num_descriptions 100000 \
      -max_hsps 100000 \
      -inclusion_ethresh 0.000001 \
      -evalue 0.000001 \
      -db $DATABASE \
      -query $JOB_ID.fasta \
      -show_gis  -outfmt 6 \
      -num_threads 1 \
      -out $JOB_ID.psi \
      -out_pssm $JOB_ID.asn1 \
      -out_ascii_pssm $JOB_ID.mat

where ``$DATABASE`` shell variable holds path to blast database. When ``$JOB_ID`` e.g. equals to ``2gb1``, 
psiblast reads ``2gb1.fasta`` input file and produces the following output files: 

  - sequence profile ``2gb1.asn1``
  - list of hits ``2gb1.psi``
  - pssm matrix ``2gb1.mat``

At this step we also convert `.asn1` file to Rosetta's checkpoint file using ``seqc`` program from `BioShell` package:

    .. code-block:: bash
    
        seqc -in:profile:asn1=2gb1.asn1 -out:profile:txt=2gb1.prof

**Note:** The procedure for the old (legacy) version of **psiblast** is very similar, the program however produces ``.chk`` binary file instead of
``.asn1``. The ``.chk`` file may be directly used by the picker.

  
2. Running secondary structure predictors
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This protocol uses two predictors: SpineX and PsiPred. Both programs come with their scripts that
automatically run **psiblast**. Here we avoid excessive **psiblast** runs by feeding results from 
the previos step to the predictors. This has been automated by two Python scripts: 
``run_psipred.py`` and ``run_spinex.py``. Results of **SpineX** has to be converted to **PsiPred**'s 
``.ss2`` format


    .. code-block:: bash
    
        run_spinex.py 2gb1.mat
        run_psipred 2gb1.asn1
        ss_pred_converter.py -x out_ss1
    

3. Running fragment picker
~~~~~~~~~~~~~~~~~~~~~~~~~~
Prepare a flagfile which provides command line options to ``fragment_picker`` application:

    .. code-block:: bash
    
      -in::file::vall 	filtered.vall.dat.2006-05-05
      -frags::frag_sizes	 3 9
      -frags::describe_fragments	 $JOB_ID.fsc
      -out::file::frag_prefix 	 $JOB_ID-multiR.3w
      -frags::scoring::config 	 ../scoring-multirama.wghts
      -mute 				 core.fragment.picking.VallProvider
      -frags::ss_pred 		 $JOB_ID.psipred.ss2 psipred $JOB_ID.spinex.psipred_ss2  spinex
      -in:file::native 		 $JOB_ID.pdb
      -in::file::checkpoint 		 $JOB_ID.chk
      #-in::file::s 			 $JOB_ID.pdb # provide the reference protein structure, if you have one
      #-frags::denied_pdb 		 $JOB_ID.homologs # remove the given PDB IDs from hits (for testing purposes only)
      -frags::n_candidates 	 	 1000
      -frags::n_frags 		 200
      -frags::picking::quota_config_file	 ../quota.def
      -mute 				 core.conformation
      -mute 				 core.chemical

The `quota.def` file is given below:

    .. code-block:: bash
   
        #pool_id	 pool_name	 fraction
        1 		 psipred	 0.5
        2		 spinex 	 0.5 

Next, run the fragment picker e.g.:

    .. code-block:: bash
      
       fragment_picker.default.linuxgccrelease @picker.flagfile