Personal tools
You are here: Home / Installing Biskit / Install helper applications / Local Blast

Local Blast

Installing and testing local blast tools, databases and environment.
  1. Get the NCBI toolkit

    Download using anonymous ftp from ftp.ncbi.nih.gov. (ftp://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools/CURRENT/)

    Install it, e.g. from source into /usr/local

    Note: The openmotif (or Motif) libraries are required for compilation, if not already installed they should be readily available for your platform. For example, in (open)Suse the packages are called openmotif and openmotif-lib, Fink for Mac OSX finds them under the name openmotif3. Ubuntu users need to install a whole set of pacakges: build-essential, libmotif-dev, libx11-dev, x11proto-print-dev, xorg-dev, libxp-dev, but also the very basic csh.

    If compilation still fails with "No rule to make target ni_debug.o", just try ncbi/make/makedis.csh once more, at least that worked for me...

    cd /usr/local/src
    sudo wget ftp://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools/CURRENT/ncbi.tar.gz
    sudo tar xvfz ncbi.tar.gz
    sudo ncbi/make/makedis.csh  ## needs to be run from parent folder of ncbi/
    
    cd /usr/local/bin
    ln -s ../src/ncbi/bin/blastall   .
    ln -s ../src/ncbi/bin/blastclust .
    ln -s ../src/ncbi/bin/fastacmd   .
    ln -s ../src/ncbi/bin/blastpgp   .
    ln -s ../src/ncbi/bin/formatdb   .
    ln -s ../src/ncbi/bin/bl2seq     .
    

    Alternatively, you can of course simply put ncbi/bin into your $PATH. The 6 programs above are the ones Biskit/Mod needs.

    Note:

    The ncbi tools are available as Ubuntu/Debian package 'blast2' (package ncbi-tools-bin does not contain the blast programs). But the packaged version may be outdated and may not be compatible with the most recent Biopython version.

  2. Install data bases

    2.1 option 1 (recommended) - from FASTA formated databases:

    • Download and uncompress the databases via anonymous ftp (in binary mode) from ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA. The minimum requirement is swissprot, pdbaa, and nr:

      wget -c ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/pdbaa.gz
      wget -c ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/swissprot.gz
      wget -c ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nr.gz
      
      gzip -d pdbaa.gz
      gzip -d swissprot.gz
      gzip -d nr.gz
      
    • Format the databases with -o T option, so that they can be used with fastacmd. Example:

      formatdb -i pdbaa -p T -o T
      formatdb -i swissprot -p T -o T
      formatdb -i nr -p T -o T
      

    2.2 option 2 - from preformated databases (needs less space):

    • Download the databases via anonymous ftp (in binary mode) from ftp://ftp.ncbi.nlm.nih.gov/blast/db OR download the script update_blastdb.pl and use it to download/update your databases. (http://www.ncbi.nlm.nih.gov/blast/docs/update_blastdb.pl). I recommend using the wget utility because it can resume interrupted downloads and gives you some progress indication:

      cd /opt
      mkdir db; mkdir db/blast
      cd /opt/db/blast
      
      wget -c ftp://ftp.ncbi.nih.gov/blast/db/nr.00.tar.gz
      wget -c ftp://ftp.ncbi.nih.gov/blast/db/nr.01.tar.gz
      wget -c ftp://ftp.ncbi.nih.gov/blast/db/pdbaa.tar.gz
      wget -c ftp://ftp.ncbi.nih.gov/blast/db/swissprot.tar.gz
      
      tar zxvf *gz
      
    • The minimum requirement is nr (parent of the other databases), swissprot and pdbaa. This will make a total of about 1.2 GB (as of 01/2007).

      Note added 2012: I leave this statement here for its historic value ;) /Raik

    • Note:

      Last time I tried, the databases were not preformatted with -o for fastacmd. If the test below doesn't work, you can create fasta versions of the preformatted files (rather than downloading them again):

      fastacmd -d pdbaaa -D 1 > ../blast_fasta/pdbaa
      fastacmd -d swissprot -D 1 > ../blast_fasta/swissprot
      fastacmd -d nr -D 1 > ../blast_fasta/nr
      

      and then continue with the formatdb step in 2.2 to rebuild the database.

  3. Set up NCBI environment

    3.1 option 1 (recommended) - create ~/.ncbirc

    • Create a file named .ncbirc in your home and edit the path to the ncbi data directory and add proxy info if you need one. Example:

      [NCBI]
      DATA=/usr/local/src/ncbi/data
      
      [NET_SERV]
      SRV_CONN_MODE=FIREWALL
      SRV_HTTP_PROXY_HOST=cache.pasteur.fr
      SRV_HTTP_PROXY_PORT=8080
      

    3.2 option 2 - without .ncbirc

    • tcsh example:

      setenv BLASTMAT /usr/local/src/ncbi/data
      
  4. set up database location

    • Export the database path $BLASTDB, for example:

      zsh:  export BLASTDB=/opt/db/blastdb
      tcsh: setenv BLASTDB /opt/db/blastdb
      

      Replace /opt/db/blastdb by the folder where you installed the database files during step 2.1. $BLASTDB must contain all data base and index files (links to files also work). The following files are needed:

      • nr.*
      • pdbaa.*
      • swiss.*
    • Configure Biskit for non-standard database names (if necessary)

      The standard database names (nr, pdbaa, swissprot) can be adjusted in ~/.biskit/settings_Mod.cfg, by default the beginning of the file looks like this:

      [NORMAL]
      db_nr = nr            ##  default name of blast nr sequence database
      db_pdbaa = pdbaa      ##  default name of blast pdb sequence database
      db_swiss = swissprot  ##  default name of blast swissprot sequence database
      
      ... ## other parameters
      

      If your actual database filenames are different, you can either change the db-* entries in settings_Mod.cfg or, alternatively, make symbolic links (e.g. nrdb.data -> nr.data).

  5. test blast environment

    Test your blast setup by following this example:

    cd ~/biskit/Biskit/testdata/Mod
    
    # retrieve sequence
    fastacmd -d pdbaa -s 3TGI >! test.fasta
    
    # search against PDB
    blastall -p blastp -d pdbaa < test.fasta
    
    # search against Swissprot
    blastall -p blastp -d swissprot < test.fasta
    
    # search against all
    blastall -p blastp -d nr < test.fasta > nr.out
    
    # cluster sequences
    blastclust -i test_blastclust.fasta
    -> should yield something roughly similar to:
    [blastclust] WARNING: Could not find taxdb.bti
    Apr 19, 2004  1:30 PM Start clustering of 12 queries
    a b c d e g h i j l
    f
    k
    m
    

Usage:
see Biskit.Mod.SequenceSearcher and Biskit.Mod.TemplateSearcher
Custom configuration:
see biskit/external/defaults/settings_Mod.cfg