Creating a blast database and running blast
¶
Running formatdb
to create blast database from a FASTA file¶
The command below transforms a set of sequences in .fasta
format into a blast database. The output file names will start with uniref90
makeblastdb -in uniref90.fasta -dbtype prot -title uniref90 -parse_seqids -out uniref90
Running psiblast
¶
Example script that runs the new psiblast
command is given below:
TARGET='2azaA' DIR=$TARGET psiblast -num_iterations 5 \ -num_alignments 100000 \ -num_descriptions 100000 \ -max_hsps 0 \ -inclusion_ethresh 0.000001 \ -evalue 0.000001 \ -db /home/bioshell_server/TOOLS/blast+/data/nr \ -query ./$DIR/$TARGET.fasta \ -show_gis -outfmt 0 \ -num_threads 12 \ -out ./$DIR/$TARGET.psi \ -out_pssm ./$DIR/$TARGET.asn1 \ -out_ascii_pssm ./$DIR/$TARGET.fasta.mat
On the contrary to the original blastpgp
command, now the names of command line parameters are quite self-explanatory. One of the advantages of the new version of blast is its parallel implementation allowing run a single job below a minute. Resulting sequence profiles are now stored in ASN.1 format, which is supported by BioShell.
Remember to change -outfmt
parameter value according to the output you need. The most common choices in our laboratoy are:
-outfmt 0
prints pairwise local alignments
-outfmt 5
produces output in the XML format; such a file can be easily processed with BioShell (ap_blast_nonredundant
,ap_blastxml_to_fasta
andap_blastxml_to_hsp
programs)
-outfmt 6
produces tabular output. Sequence IDs may be easily extracted with shell commands (such asgrep
andawk
). One can therefore easily extract full sequences found by psiblast using theblastdbcmd
command as described below.
Extracting sequences from the database¶
blastdbcmd -db uniref50 -dbtype prot -entry_batch hits.txt -outfmt %f -out hits.fasta