Creating a blast database and running blast

Running formatdb to create blast database from a FASTA file

The command below transforms a set of sequences in .fasta format into a blast database. The output file names will start with uniref90

makeblastdb -in uniref90.fasta -dbtype prot -title uniref90 -parse_seqids -out uniref90

Running psiblast

Example script that runs the new psiblast command is given below:

TARGET='2azaA'
DIR=$TARGET
psiblast -num_iterations 5 \
  -num_alignments 100000 \
  -num_descriptions 100000 \
  -max_hsps 0 \
  -inclusion_ethresh 0.000001 \
  -evalue 0.000001 \
  -db /home/bioshell_server/TOOLS/blast+/data/nr \
  -query ./$DIR/$TARGET.fasta \
  -show_gis  -outfmt 0 \
  -num_threads 12 \
  -out ./$DIR/$TARGET.psi \
  -out_pssm ./$DIR/$TARGET.asn1 \
  -out_ascii_pssm ./$DIR/$TARGET.fasta.mat

On the contrary to the original blastpgp command, now the names of command line parameters are quite self-explanatory. One of the advantages of the new version of blast is its parallel implementation allowing run a single job below a minute. Resulting sequence profiles are now stored in ASN.1 format, which is supported by BioShell. Remember to change -outfmt parameter value according to the output you need. The most common choices in our laboratoy are:

  • -outfmt 0 prints pairwise local alignments

  • -outfmt 5 produces output in the XML format; such a file can be easily processed with BioShell (ap_blast_nonredundant, ap_blastxml_to_fasta and ap_blastxml_to_hsp programs)

  • -outfmt 6 produces tabular output. Sequence IDs may be easily extracted with shell commands (such as grep and awk). One can therefore easily extract full sequences found by psiblast using the blastdbcmd command as described below.

Extracting sequences from the database

blastdbcmd -db uniref50 -dbtype prot -entry_batch hits.txt -outfmt %f -out hits.fasta