.. _doc_fragment_picking: How to pick fragments for modeling with Rosetta =============================================== Fragment picking process can be divided into three stages: 1. running **psiblast** 2. running secondary structure predictors 3. actual fragment picking 1. Running **psiblast** ~~~~~~~~~~~~~~~~~~~~~~~ In this step **psiblast** is used to find sequences homologus to the query and to build a sequence profile. Results from **psiblast** will be later used by secondary structure predictors and by the fragment picker itself. We use the following parameters to run **psiblast** program of the new **blast+** package (typically located in blast+/bin/ directory): .. code-block:: bash psiblast -num_iterations 5 \ -num_alignments 100000 \ -num_descriptions 100000 \ -max_hsps 100000 \ -inclusion_ethresh 0.000001 \ -evalue 0.000001 \ -db $DATABASE \ -query $JOB_ID.fasta \ -show_gis -outfmt 6 \ -num_threads 1 \ -out $JOB_ID.psi \ -out_pssm $JOB_ID.asn1 \ -out_ascii_pssm $JOB_ID.mat where ``$DATABASE`` shell variable holds path to blast database. When ``$JOB_ID`` e.g. equals to ``2gb1``, psiblast reads ``2gb1.fasta`` input file and produces the following output files: - sequence profile ``2gb1.asn1`` - list of hits ``2gb1.psi`` - pssm matrix ``2gb1.mat`` At this step we also convert `.asn1` file to Rosetta's checkpoint file using ``seqc`` program from `BioShell` package: .. code-block:: bash seqc -in:profile:asn1=2gb1.asn1 -out:profile:txt=2gb1.prof **Note:** The procedure for the old (legacy) version of **psiblast** is very similar, the program however produces ``.chk`` binary file instead of ``.asn1``. The ``.chk`` file may be directly used by the picker. 2. Running secondary structure predictors ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This protocol uses two predictors: SpineX and PsiPred. Both programs come with their scripts that automatically run **psiblast**. Here we avoid excessive **psiblast** runs by feeding results from the previos step to the predictors. This has been automated by two Python scripts: ``run_psipred.py`` and ``run_spinex.py``. Results of **SpineX** has to be converted to **PsiPred**'s ``.ss2`` format .. code-block:: bash run_spinex.py 2gb1.mat run_psipred 2gb1.asn1 ss_pred_converter.py -x out_ss1 3. Running fragment picker ~~~~~~~~~~~~~~~~~~~~~~~~~~ Prepare a flagfile which provides command line options to ``fragment_picker`` application: .. code-block:: bash -in::file::vall filtered.vall.dat.2006-05-05 -frags::frag_sizes 3 9 -frags::describe_fragments $JOB_ID.fsc -out::file::frag_prefix $JOB_ID-multiR.3w -frags::scoring::config ../scoring-multirama.wghts -mute core.fragment.picking.VallProvider -frags::ss_pred $JOB_ID.psipred.ss2 psipred $JOB_ID.spinex.psipred_ss2 spinex -in:file::native $JOB_ID.pdb -in::file::checkpoint $JOB_ID.chk #-in::file::s $JOB_ID.pdb # provide the reference protein structure, if you have one #-frags::denied_pdb $JOB_ID.homologs # remove the given PDB IDs from hits (for testing purposes only) -frags::n_candidates 1000 -frags::n_frags 200 -frags::picking::quota_config_file ../quota.def -mute core.conformation -mute core.chemical The `quota.def` file is given below: .. code-block:: bash #pool_id pool_name fraction 1 psipred 0.5 2 spinex 0.5 Next, run the fragment picker e.g.: .. code-block:: bash fragment_picker.default.linuxgccrelease @picker.flagfile