How to pick fragments for modeling with Rosetta¶
Fragment picking process can be divided into three stages:
running psiblast
running secondary structure predictors
actual fragment picking
1. Running psiblast¶
In this step psiblast is used to find sequences homologus to the query and to build a sequence profile. Results from psiblast will be later used by secondary structure predictors and by the fragment picker itself. We use the following parameters to run psiblast program of the new blast+ package (typically located in blast+/bin/ directory):
psiblast -num_iterations 5 \ -num_alignments 100000 \ -num_descriptions 100000 \ -max_hsps 100000 \ -inclusion_ethresh 0.000001 \ -evalue 0.000001 \ -db $DATABASE \ -query $JOB_ID.fasta \ -show_gis -outfmt 6 \ -num_threads 1 \ -out $JOB_ID.psi \ -out_pssm $JOB_ID.asn1 \ -out_ascii_pssm $JOB_ID.mat
where $DATABASE
shell variable holds path to blast database. When $JOB_ID
e.g. equals to 2gb1
,
psiblast reads 2gb1.fasta
input file and produces the following output files:
sequence profile
2gb1.asn1
list of hits
2gb1.psi
pssm matrix
2gb1.mat
At this step we also convert .asn1 file to Rosetta’s checkpoint file using seqc
program from BioShell package:
seqc -in:profile:asn1=2gb1.asn1 -out:profile:txt=2gb1.prof
Note: The procedure for the old (legacy) version of psiblast is very similar, the program however produces .chk
binary file instead of
.asn1
. The .chk
file may be directly used by the picker.
2. Running secondary structure predictors¶
This protocol uses two predictors: SpineX and PsiPred. Both programs come with their scripts that
automatically run psiblast. Here we avoid excessive psiblast runs by feeding results from
the previos step to the predictors. This has been automated by two Python scripts:
run_psipred.py
and run_spinex.py
. Results of SpineX has to be converted to PsiPred’s
.ss2
format
run_spinex.py 2gb1.mat run_psipred 2gb1.asn1 ss_pred_converter.py -x out_ss1
3. Running fragment picker¶
Prepare a flagfile which provides command line options to fragment_picker
application:
-in::file::vall filtered.vall.dat.2006-05-05 -frags::frag_sizes 3 9 -frags::describe_fragments $JOB_ID.fsc -out::file::frag_prefix $JOB_ID-multiR.3w -frags::scoring::config ../scoring-multirama.wghts -mute core.fragment.picking.VallProvider -frags::ss_pred $JOB_ID.psipred.ss2 psipred $JOB_ID.spinex.psipred_ss2 spinex -in:file::native $JOB_ID.pdb -in::file::checkpoint $JOB_ID.chk #-in::file::s $JOB_ID.pdb # provide the reference protein structure, if you have one #-frags::denied_pdb $JOB_ID.homologs # remove the given PDB IDs from hits (for testing purposes only) -frags::n_candidates 1000 -frags::n_frags 200 -frags::picking::quota_config_file ../quota.def -mute core.conformation -mute core.chemical
The quota.def file is given below:
#pool_id pool_name fraction 1 psipred 0.5 2 spinex 0.5
Next, run the fragment picker e.g.:
fragment_picker.default.linuxgccrelease @picker.flagfile