.. _bioinfo_tools: Creating a blast database and running ``blast`` ================================================ **Running** ``formatdb`` **to create blast database from a FASTA file** ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The command below transforms a set of sequences in ``.fasta`` format into a **blast** database. The output file names will start with *uniref90* ``makeblastdb -in uniref90.fasta -dbtype prot -title uniref90 -parse_seqids -out uniref90`` **Running** ``psiblast`` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Example script that runs the new ``psiblast`` command is given below: .. code-block:: bash TARGET='2azaA' DIR=$TARGET psiblast -num_iterations 5 \ -num_alignments 100000 \ -num_descriptions 100000 \ -max_hsps 0 \ -inclusion_ethresh 0.000001 \ -evalue 0.000001 \ -db /home/bioshell_server/TOOLS/blast+/data/nr \ -query ./$DIR/$TARGET.fasta \ -show_gis -outfmt 0 \ -num_threads 12 \ -out ./$DIR/$TARGET.psi \ -out_pssm ./$DIR/$TARGET.asn1 \ -out_ascii_pssm ./$DIR/$TARGET.fasta.mat On the contrary to the original ``blastpgp`` command, now the names of command line parameters are quite self-explanatory. One of the advantages of the new version of blast is its parallel implementation allowing run a single job below a minute. Resulting sequence profiles are now stored in ASN.1 format, which is supported by BioShell. Remember to change ``-outfmt`` parameter value according to the output you need. The most common choices in our laboratoy are: - ``-outfmt 0`` prints pairwise local alignments - ``-outfmt 5`` produces output in the XML format; such a file can be easily processed with BioShell (``ap_blast_nonredundant``, ``ap_blastxml_to_fasta`` and ``ap_blastxml_to_hsp`` programs) - ``-outfmt 6`` produces tabular output. Sequence IDs may be easily extracted with shell commands (such as ``grep`` and ``awk``). One can therefore easily extract full sequences found by *psiblast* using the ``blastdbcmd`` command as described below. **Extracting sequences from the database** ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``blastdbcmd -db uniref50 -dbtype prot -entry_batch hits.txt -outfmt %f -out hits.fasta``