LocalBlast


Local Blast

 

The NCBI blast search and databases are freely available on the web and this is usually the best way to access them. Sometimes you need to do more with the data and this is where a local search is more useful. A major advantage is the ability to construct your own databases and then use the blast search algorithm to search through them. The program Multiseq can be integrated with a local blast search thus greatly increasing the programs value.

 

The blast software is a collection of programs that let you search specially formatted databases of nucleotides or amino acids. It is available for both unix-based and windows systems. The Blast introduction gives instructions on how to download and install the software.

 

After the blast program is installed (for this info page I'm assuming its been installed to c:\blast) you can download a dataset. The databases are avaiable from the NCBI database ftp site. It is suggested that you start and configure things with a small dataset like ecoli.nt or ecoli.aa.

 

The datasets must then be formatted into a blast searchable file format. This is done using the blastdb program from the command line. I had problems getting to the programs from the command line, if you put all the datasets in the same directory as the programs (c:\blast\bin) this works fine but it isn't very elegant. You can also tell the computer to look for the blast programs in c:\blast\bin by editing the path variable in windows (google it for your windows distribution).

 

When formatting your database using formatdb it is handy to use the -o T switch. There are a whole bunch of conditions that the database needs to match to use this switch but if you download the data from ncbi the dataset should be in the correct format. This switch creates a further set of 3 datasets that keeps track of the GI numbers so you can cross-reference your blast searches with genbank. You can pull out fasta formatted sequences from your blast results using fastacmd. Fastacmd retrives FASTA formatted sequences from a blast database. If you read the readme for fastacmd it gives instructions on getting the further database that you require: taxdb.tar.gz. This holds the taxonomy information.

 

Once you have the blast program installed and the blast databases formatted you can use the blastall program to blast your query sequence (in fasta format).

 

Creative Commons License

This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.