Local Blast
Get the NCBI toolkit
Download using anonymous ftp from ftp.ncbi.nih.gov. (ftp://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools/CURRENT/)
Install it, e.g. from source into /usr/local
Note: The openmotif (or Motif) libraries are required for compilation, if not already installed they should be readily available for your platform. For example, in (open)Suse the packages are called openmotif and openmotif-lib, Fink for Mac OSX finds them under the name openmotif3. Ubuntu users need to install a whole set of pacakges: build-essential, libmotif-dev, libx11-dev, x11proto-print-dev, xorg-dev, libxp-dev, but also the very basic csh.
If compilation still fails with "No rule to make target ni_debug.o", just try ncbi/make/makedis.csh once more, at least that worked for me...
cd /usr/local/src sudo wget ftp://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools/CURRENT/ncbi.tar.gz sudo tar xvfz ncbi.tar.gz sudo ncbi/make/makedis.csh ## needs to be run from parent folder of ncbi/ cd /usr/local/bin ln -s ../src/ncbi/bin/blastall . ln -s ../src/ncbi/bin/blastclust . ln -s ../src/ncbi/bin/fastacmd . ln -s ../src/ncbi/bin/blastpgp . ln -s ../src/ncbi/bin/formatdb . ln -s ../src/ncbi/bin/bl2seq .
Alternatively, you can of course simply put ncbi/bin into your $PATH. The 6 programs above are the ones Biskit/Mod needs.
Note:
The ncbi tools are available as Ubuntu/Debian package 'blast2' (package ncbi-tools-bin does not contain the blast programs). But the packaged version may be outdated and may not be compatible with the most recent Biopython version.
Install data bases
2.1 option 1 (recommended) - from FASTA formated databases:
Download and uncompress the databases via anonymous ftp (in binary mode) from ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA. The minimum requirement is swissprot, pdbaa, and nr:
wget -c ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/pdbaa.gz wget -c ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/swissprot.gz wget -c ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nr.gz gzip -d pdbaa.gz gzip -d swissprot.gz gzip -d nr.gz
Format the databases with -o T option, so that they can be used with fastacmd. Example:
formatdb -i pdbaa -p T -o T formatdb -i swissprot -p T -o T formatdb -i nr -p T -o T
2.2 option 2 - from preformated databases (needs less space):
Download the databases via anonymous ftp (in binary mode) from ftp://ftp.ncbi.nlm.nih.gov/blast/db OR download the script update_blastdb.pl and use it to download/update your databases. (http://www.ncbi.nlm.nih.gov/blast/docs/update_blastdb.pl). I recommend using the wget utility because it can resume interrupted downloads and gives you some progress indication:
cd /opt mkdir db; mkdir db/blast cd /opt/db/blast wget -c ftp://ftp.ncbi.nih.gov/blast/db/nr.00.tar.gz wget -c ftp://ftp.ncbi.nih.gov/blast/db/nr.01.tar.gz wget -c ftp://ftp.ncbi.nih.gov/blast/db/pdbaa.tar.gz wget -c ftp://ftp.ncbi.nih.gov/blast/db/swissprot.tar.gz tar zxvf *gz
The minimum requirement is nr (parent of the other databases), swissprot and pdbaa. This will make a total of about 1.2 GB (as of 01/2007).
Note added 2012: I leave this statement here for its historic value ;) /Raik
Note:
Last time I tried, the databases were not preformatted with -o for fastacmd. If the test below doesn't work, you can create fasta versions of the preformatted files (rather than downloading them again):
fastacmd -d pdbaaa -D 1 > ../blast_fasta/pdbaa fastacmd -d swissprot -D 1 > ../blast_fasta/swissprot fastacmd -d nr -D 1 > ../blast_fasta/nr
and then continue with the formatdb step in 2.2 to rebuild the database.
Set up NCBI environment
3.1 option 1 (recommended) - create ~/.ncbirc
Create a file named .ncbirc in your home and edit the path to the ncbi data directory and add proxy info if you need one. Example:
[NCBI] DATA=/usr/local/src/ncbi/data [NET_SERV] SRV_CONN_MODE=FIREWALL SRV_HTTP_PROXY_HOST=cache.pasteur.fr SRV_HTTP_PROXY_PORT=8080
3.2 option 2 - without .ncbirc
tcsh example:
setenv BLASTMAT /usr/local/src/ncbi/data
set up database location
Export the database path $BLASTDB, for example:
zsh: export BLASTDB=/opt/db/blastdb tcsh: setenv BLASTDB /opt/db/blastdb
Replace /opt/db/blastdb by the folder where you installed the database files during step 2.1. $BLASTDB must contain all data base and index files (links to files also work). The following files are needed:
- nr.*
- pdbaa.*
- swiss.*
Configure Biskit for non-standard database names (if necessary)
The standard database names (nr, pdbaa, swissprot) can be adjusted in ~/.biskit/settings_Mod.cfg, by default the beginning of the file looks like this:
[NORMAL] db_nr = nr ## default name of blast nr sequence database db_pdbaa = pdbaa ## default name of blast pdb sequence database db_swiss = swissprot ## default name of blast swissprot sequence database ... ## other parameters
If your actual database filenames are different, you can either change the db-* entries in settings_Mod.cfg or, alternatively, make symbolic links (e.g. nrdb.data -> nr.data).
test blast environment
Test your blast setup by following this example:
cd ~/biskit/Biskit/testdata/Mod # retrieve sequence fastacmd -d pdbaa -s 3TGI >! test.fasta # search against PDB blastall -p blastp -d pdbaa < test.fasta # search against Swissprot blastall -p blastp -d swissprot < test.fasta # search against all blastall -p blastp -d nr < test.fasta > nr.out # cluster sequences blastclust -i test_blastclust.fasta -> should yield something roughly similar to: [blastclust] WARNING: Could not find taxdb.bti Apr 19, 2004 1:30 PM Start clustering of 12 queries a b c d e g h i j l f k m
- Usage:
- see Biskit.Mod.SequenceSearcher and Biskit.Mod.TemplateSearcher
- Custom configuration:
- see biskit/external/defaults/settings_Mod.cfg