Introduction

getting to know Biskit

A detailed introduction is pending.

1. Layout of Biskit

This is a schematic overview over the Biskit project. The main parts are:

The Biskit library -- contains all the Biskit classes and operations.
The /scripts folder -- contains special-purpose programs using this library.
The Biskit/data folder (formerly /external) -- contains input scripts, default parameters and data needed by external programs
The Biskit/testdata folder (formerly /test) -- contains data for the PyUnit test suites of each module.
sourceforge project -- provides the source code repository and bug tracking system.

Many tasks are delegated to external applications via a standard interface (see Executor.py) and computations can be distributed accross a large number of computers with the aid of PVM (parallel virtual machine). A simple Master / Slave scheme hides all the gory pvm details and makes parallelisation quite easy.

Biskit module outline

The most important folder for you as developer is biskit/Biskit which you should include into your Pythonpath. The Biskit module is structured into:

. -- the base package with common classes
PVM -- parallelisation-related classes
Dock -- protein-protein complexes and docking
Mod -- homology modelling

2. Before you start...

Please aquaint yourself with the basic array handling methods of Numeric Python (Numpy), in particular with:

take (extracting part of an array given positions)
compress (extracting part of an array given a mask)
put (assign values to array positions)

Other useful methods are: shape, ravel, argsort, argmax, nonzero, clip, sum, resize. The Numpy tutorial gives an introduction to these and other functions.

Numpy functions are not only heavily used in Biskit but Biskit also implements funtions of the same name to extract and manipulate parts of structures and trajectories.

3. Basic classes

The probably most central classes of the Biskit package are:

PDBModel plus, depending on what you want to do, Trajectory, PDBDope and Dock/Complex. Until we put a short tutorial online, please have yourself a look at the extensive documentation and example code of these classes! See also the online API reference which is generated from the source code. Here is a schematic overview of the data model (see also our application note):

	A PDBModel contains a coordinate matrix plus atom and residue profiles (arrays of values) with the data from the PDB. New profiles with additional data are easily added. (Click to see a large image!)
	A Trajectory is, basically, a PDBModel with an additional time dimension. Thus, the 2-D array of coordinates turns into a 3-dimensional array of coordinate frames. Profiles of arbitrary data can be assigned to this new time axis. The PDBModel methods for extracting atoms, residues or chains are also available (e.g. compressAtoms) but now of course return new Trajectory objects with all the frames of the extracted atoms. (Click to see a large image!)
	A Complex wraps two PDBModels with a rotation/translation matrix which orients the ligand with respect to the receptor (this allows us to store 100.000s of rigid body docking results with very little memory footprint, only the rt matrix changes). There is also a cached contact matrix and a dictionary for additional meta data (scores and the like). (Click to see a large image!)

A detailed description of PDBModel can be found in our draft tutorial for using PDBModel!

4. Examples

Just as an appetizer -- assuming Biskit is installed you can calculate the CA rmsd of two closely related proteins (with some residues or atoms different) as follows:

import Biskit as B

m1 = B.PDBModel( "your/structure1.pdb" )
m2 = B.PDBModel( "your/related/structure.pdb" )

# align both models to the same residue content
# and atom order and content
i1, i2 = m1.compareAtoms( m2 )
m1 = m1.take( i1 ) # take atoms common with m2
m2 = m2.take( i2 ) # take atoms common with m1

## some checking for demonstration
assert len( m1 ) == len( m2 )
assert m1.sequence() == m2.sequence()

## get CA structures
ca1 = m1.compress( m1.maskCA() )
ca2 = m2.compress( m2.maskCA() )

## RMSD
rms = ca1.rms( ca2, fit=1 )

Here is a second example that calculates what fraction of a protein's surface is contributed by beta sheets.

import Biskit as B
import numpy as N

m = B.PDBModel('structure.pdb') 

## add information calculated by external programs
d = B.PDBDope(m)
d.addSurfaceRacer()          # MS, AS, relMS, relAS and curvature
d.addSecondaryStructure()    # secondary structure from DSSP

## molecular surface of all atoms
ms_profile = m['MS'] 

## mask for all atoms of the given secondary structure
ss_mask = N.array( m.res2atomProfile('secondary') ) == 'E' 

## fraction of total surface
ss_fraction = N.sum(ms_profile * ss_mask) / N.sum(ms_profile}

Note: This example requires the installation of two external programs: (1) SurfaceRacer and (2) DSSP. Please follow the links to installation instructions!

5. Tips and hints / finding help

Python has several methods to explore the content and documentation of a module interactively:

dir() -- lists the content of a module, object or the current namespace
object.__doc__ -- contains the doc-string of a function or class

For example:

>>> import Biskit as B

>>> dir( B )
['AmberCrdParser', 'AmberParmBuilder', 'AmberRstParser', 'AmbiguousMatch', 'BisList', 
 'BisListError', 'BiskitError'...]

>>> print B.PDBDope.__doc__

    Decorate a PDBModel with calculated properties (profiles)

>>> dir( B.PDBDope )
['__doc__', '__init__', '__module__', 'addASA', 'addConservation', 'addDensity', 'addFoldX', 
 'addIntervor', 'addSecondaryStructure', 'addSurfaceMask', 'addSurfaceRacer', 'model', 'version']

>>> print B.PDBDope.addSecondaryStructure.__doc__

        Adds a residue profile with the secondary structure as
        calculated by the DSSP program.

        Profile code::
          B = residue in isolated beta-bridge
          E = extended strand, participates in beta ladder
          G = 3-helix (3/10 helix)
          I = 5 helix (pi helix)
          T = hydrogen bonded turn
          S = bend
          . = loop or irregular

        @raise ExeConfigError: if external application is missing

The info function in Biskit.tools combines dir() and __doc__ strings to give a quick overview over the fields and methods of a class, object or module:

>>> from Biskit.tools import info
>>> from Biskit import PDBDope
>>> info( PDBDope )
NAME:     PDBDope
ID:       139712764
TYPE:     <type 'classobj'>
VALUE:    <class Biskit.PDBDope.PDBDope at 0x853d8fc>
CALLABLE: Yes
DOC:      Decorate a PDBModel with calculated properties (profiles)

METHODS
__init__       : @param model: model to dope
addASA         : Add profiles of Accessible Surface Area: 'relASA', 'ASA_tota..
addConservation: Adds a conservation profile. See L{Biskit.Hmmer}
addDensity     : Count the number of heavy atoms within the given radius.
addFoldX       : Adds dict with fold-X energies to PDBModel's info dict.
addIntervor    : Triangulate a protein-protein interface with intervor.
addSecondaryStructure: Adds a residue profile with the secondary structure as
addSurfaceMask : Adds a surface mask profie that contains atoms with > 40% ex..
addSurfaceRacer: Always adds three different profiles as calculated by fastSu..
model          : @return: model
version        : @return: version of class

FIELDS
addFoldX       : <function addFoldX at 0x8544bc4>
...

Personal tools