How to pick fragments for modeling with Rosetta

Fragment picking process can be divided into three stages:

  1. running psiblast

  2. running secondary structure predictors

  3. actual fragment picking

1. Running psiblast

In this step psiblast is used to find sequences homologus to the query and to build a sequence profile. Results from psiblast will be later used by secondary structure predictors and by the fragment picker itself. We use the following parameters to run psiblast program of the new blast+ package (typically located in blast+/bin/ directory):

psiblast -num_iterations 5 \
-num_alignments 100000 \
-num_descriptions 100000 \
-max_hsps 100000 \
-inclusion_ethresh 0.000001 \
-evalue 0.000001 \
-db $DATABASE \
-query $JOB_ID.fasta \
-show_gis  -outfmt 6 \
-num_threads 1 \
-out $JOB_ID.psi \
-out_pssm $JOB_ID.asn1 \
-out_ascii_pssm $JOB_ID.mat

where $DATABASE shell variable holds path to blast database. When $JOB_ID e.g. equals to 2gb1, psiblast reads 2gb1.fasta input file and produces the following output files:

  • sequence profile 2gb1.asn1

  • list of hits 2gb1.psi

  • pssm matrix 2gb1.mat

At this step we also convert .asn1 file to Rosetta’s checkpoint file using seqc program from BioShell package:

seqc -in:profile:asn1=2gb1.asn1 -out:profile:txt=2gb1.prof

Note: The procedure for the old (legacy) version of psiblast is very similar, the program however produces .chk binary file instead of .asn1. The .chk file may be directly used by the picker.

2. Running secondary structure predictors

This protocol uses two predictors: SpineX and PsiPred. Both programs come with their scripts that automatically run psiblast. Here we avoid excessive psiblast runs by feeding results from the previos step to the predictors. This has been automated by two Python scripts: run_psipred.py and run_spinex.py. Results of SpineX has to be converted to PsiPred’s .ss2 format

run_spinex.py 2gb1.mat
run_psipred 2gb1.asn1
ss_pred_converter.py -x out_ss1

3. Running fragment picker

Prepare a flagfile which provides command line options to fragment_picker application:

-in::file::vall   filtered.vall.dat.2006-05-05
-frags::frag_sizes         3 9
-frags::describe_fragments         $JOB_ID.fsc
-out::file::frag_prefix    $JOB_ID-multiR.3w
-frags::scoring::config    ../scoring-multirama.wghts
-mute                              core.fragment.picking.VallProvider
-frags::ss_pred            $JOB_ID.psipred.ss2 psipred $JOB_ID.spinex.psipred_ss2  spinex
-in:file::native           $JOB_ID.pdb
-in::file::checkpoint              $JOB_ID.chk
#-in::file::s                      $JOB_ID.pdb # provide the reference protein structure, if you have one
#-frags::denied_pdb                $JOB_ID.homologs # remove the given PDB IDs from hits (for testing purposes only)
-frags::n_candidates               1000
-frags::n_frags            200
-frags::picking::quota_config_file         ../quota.def
-mute                              core.conformation
-mute                              core.chemical

The quota.def file is given below:

#pool_id         pool_name       fraction
1                psipred         0.5
2                spinex          0.5

Next, run the fragment picker e.g.:

fragment_picker.default.linuxgccrelease @picker.flagfile