Overview of all scripts

The complete list of all scripts bundled with Biskit

Content

The biskit scripts are sorted into four sub-folders of biskit/scripts. Each script has a help screen which is shown if the script is called without parameters or (especially in scripts/Mod) with the -help option. This page simply collects the help screen of each script.

scripts/Biskit - general purpose scripts
scripts/Dock - protein-protein docking
scripts/Mod - homology modeling
scripts/analysis - analysis and visualisation of results

scripts/Biskit

General purpose scripts

Convenience and management

back to Content

redump.py:

Update old python pickles. Unpickle some python objects and pickle them back
to the same filename.  Disable strict class checking if necessary -- this
allows to load classes that have meanwhile changed their base
class.

Usage:  redump.py  file1 file2 ..

relocalize.py:

Re-create LocalPath in PDBModel or Complex from the original
file name (and the current environment variables).

Usage:  relocalize.py -i |file1 file2 ..|

killpp.py:

kill all processes containing a given phrase on all hosts
Syntax: killpp.py -n |part_of_name| [-a -f
           -h |host1 host2 host3 ..| -e |host4 host5 ..|]
           -n ... search phrase
           -a ... kill all without asking
           -f ... don't ask anything at all
           -h ... only look on these hosts  (default: all)
           -e ... exclude one or more hosts from killing

xload.py:

xload: show osxview for several machines.
Syntax:     xload |machine_file|
            machine_file ... text file with one machine name per line

cvsrm.py:

remove files from disk and cvs
Syntax: cvsrm file1 file2 file3...
Changes still need to be committed (cvs ci).

rm_pvm.py:

delete all /tmp/pvm* files on all hosts

bispy:

script to start the python interpreter with standard Biskit imports

replace_wildcard_import.py:

replace_wildcard_import -i |src_file| -m |module_name|
                       [-as |import_as| -e |exclude_method1 ..| ]

example: replace_wildcard_import -i tools.py -m Numeric -as N -e Complex
         will replace all calls to any Numeric method (except Complex) by N.|method|

fix_array_constructor.py:

Fix the import of Numeric.array_constructor in pickled Biskit modules
Many Biskit pickles have e.g. array_constructor dumped as
PDBModel.array_constructor

echoTestRoot.py:
```
prints the current test root
```

restartPVM.py:

Restart a distributed calculation.
Syntax:  restartPVM.py -i |rst_file| [-a]
Options:
         i  .. restart file containing result of TrackingJobMaster.getRst()
         a  .. add hosts to PVM

Structure sampling

back to Content

am_restartMD.py:

Prepare the restart of a broken Amber MD run. Current *crd etc. are
moved to oldName_TIMEps.* and the nstlim option in the input file
is set to the number of steps remaining to the end of the MD.
am_restartMD.py -f |folder| [ -t0 |time_offset| -tot |nstlim_total|
                -rst |rst_file|
                -inp |inp_file| -e |exclude_files_from_renaming| ]

   tot   - needed for 2nd restart, total number of MD steps (w/o restart)
   t0    - starting time in ps of this MD

amber2traj.py:

Convert single amber crd into Trajectory object

amber2traj.py -i sim.crd -o traj_0.dat -r ref.pdb [-b -wat -hyd -rnres -code PDBC ]

     -r     ref.pdb - must have identical atom content as sim.crd
     -b     traj has box info (3 additional coordinates per frame)
     -wat   delete WAT, Cl-, Na+ residues (after parsing)
     -hyd   delete all hydrogens (after parsing)
     -rnres rename amber residues HIE/HID/HIP, CYX to HIS and CYS
     -code  PDB code of molecule (otherwise first 4 letters of ref file name)

amberConcat.py:

Concatenate 2 amber crd/vel files.

amberConcat.py -i sim1.crd sim2.crd -o sim_merged.crd -n n_atoms [-b
               -p |int_precission| -rst |last_sim1.rst last_sim2.rst| ]

  -n     number of atoms (obligatory)
  -b     traj has box info (3 additional coordinates)
  -p     when looking for overlapping block, round coord. to p positions
  -rst   try repairing last frame of sim1, sim2 from given restart file(s)

amber_ensembleMD.py:

Prepare ensemble MD with amber - requires template_pme_ensemble folder
Syntax:  amber_ensembleMD.py -parm |parm_file| -crd |crd_file| -out |result_folder|
                             -pdb |0_pdb_file|
                           [ -nb_nodes   |n_nodes_per_host|
                             -template   |template_folder|
                             -nodes_eq   |2 hosts for minimiz.|
                             -nodes_prod |10 hosts for prod.|
                             -n_members  |10|
                             -rseed      |int_random_seed|
                             -dt         |production_time_step|
                             -n_steps    |production_step_number|
                             -ntwx       |production_coordinate_writing_interval|
                             -ntwv       |production_velocities_writing_interval|
                             -place_holder1 |value| -place_holder2 |value| ..]

The script prepares a new folder |out| with all the input and start files
to run multiple copies of an Amber MD. All input/start files/folders are
copied from a template folder. Template files and folders ending in 'xx'
are recreated |n_members| times. Strings ala '%(place_holder)s' in any
template file in any template folder are replaced by the value of
self.|place_holder| which can be given at the command line. If
self.place_holder contains a list of values, each item is only used once
(e.g. nodes_prod or nodes_eq ).

Requirements: -$AMBERHOME must be set
              -LAM environment must be set up in .cshrc or .zshenv or etc.
              -start_eq must be run from first host in nodes_eq.dat !

amber_pdb2parm.py:

Create amber topology and coordinate files from a PDB.

Syntax: am_pdb2parm.py -i |PDBfile| [-o |out.parm| ...any of options below ]
    OR: am_pdb2parm.py -i |PDBfile| -mirror [-o |out.parm| ]

Result: out.parm, out.crd, out.pdb, (and leap.log in current folder)

Special option -mirror: create a parm for exact atom content of input PDB
                      (no S-S bonds formed, atoms missing from residues..)
                      This parm can be used for ptraj but not for simulations!
Options:
        ocrd      - str, target file for crd (coordinates) [|f_out_base|.crd]
        opdb      - str, target file for pdb               [|f_out_base|.pdb]
        hetatm    - keep hetero atoms                                             [don't]
        cap       - put ACE and NME capping residue on chain breaks   [don't]
        capN      - int int, indices of chains that should get ACE cap     []
        capC      - int int, indices of chains that should get NME cap     []
        box       - float, minimal distance of solute from box edge    [10.0]
        fmod      - str str, list of files with amber parameter modifications
                    (to be loaded into leap with loadAmberParams)              []
        fprep     - str str, list of files with amber residue definitions
                    (to be loaded into leap with loadAmberPrep)                []

        leap_template - str, path to template file for leap input [use default]
        leaprc        - str, path to parameter file for leap [use default]
        leap_out      - str, target file for leap.log [default: discard]
        leap_in       - str, target file for leap.in script [default: discard]
        leap_pdb      - str, target file for cleaned input pdb      [discard]
        leap_bin      - str, path to tleap binary [use default]
        norun         - do not run leap, only prepare files
        debug         - keep all temporary files

        more -key value pairs for place holders in  leap input template

Comments:
- The protonation state of histidines is decided from the H-atoms that are found
  (HD, HE, or both). After that all H are removed to be added again by leap.
- Cleaning tries to convert non-standard residues to the closest standard one.
- Cleaning removes non-standard atoms (and atoms following them) from standard residues.
- Cleaning keeps the largest / first of multiple occupancies
- Ends of chains are assumed if the residue numbering jumps backward, if there
  is a TER record or chain ID or segid change, or if there is a chain break.
- A chain break is assumed if there is an untypical gap in the chain of back-
  bone atoms (see PDBModel.chainBreaks() ).
- The index of the first chain is 0.
- Original waters are deleted.
- As usual, options can also be put into a file and loaded with the -x option

pcr_crd2pdb.py:

Extract PDB files from Xplor PCR trajectories.

Syntax: pcr_crd2pdb -i |pcrFolder| -t |psfFolder| -o |outFolder|
                    [ -n_iter |n_iterations| -skip |stepping| -z ]

     -z       gzip crd files
     -skip    MD-step intervall for PDBs (500 = 1/ps)
     -n_iter  number of iterations per ensemble member (50 = 50ps)

reduceTraj.py:

Reduce all-atom trajectory to trajectory with only one backbone and
up to 2 side chain atoms per residue.
The new atoms are the centers of mass of several atoms and carry the
weight of the pooled atoms in an atom profile called 'mass'.

reduceTraj.py -i traj.dat [-o traj_reduced.dat -amber -red |red_traj.dat|]

     i     - pickled Trajectory object (all atoms)
     o     - alternative output file name (default: 'reduced_' + input.dat)
     red   - pickled reduced traj, just update ref model in given traj.
     amber - rename amber HIE/HIP/HID, CYX -> HIS, CYS; unwrap atom names

runPcr.py:

Start single Xplor PCR job on remote host.

Syntax: runPCR -t |psfFolder| -h |host|
              [-f |Force| -r |resultFolder| -n |nice| -i |inpFolder| ]

Options:
        -f     force constant for PCR restraint
        -r     base folder for result (sub-folder will be created)
        -t     folder with topology (psf, pdb)
        -n     nice value
        -h     host computer (accessible via ssh)
        -i     folder with all input files, must contain restart_h2o.inp'
        -parm  folder with param19.* files

A MD folder called pcr_<PDBCode> is created. The force constant is
written to a file 'oldenergy' in this folder. The topology folder is
copied to the new md folder and renamed to the PDB code
(which is taken from the first 4 letters of the psf file name).
The job is started via ssh on the remote host.
A summary of all used parameters is written to runReport.out.

NOTE: The pdb/psf file has to be 4 characters long and start with
      a number (the segId has to conform to the same format).

By using python -i runPCR.py .. a python shell remains open and
the job can be killed with the command r.kill()

thinTraj.py:

This script is used only for the test_multidock example and is
used to remove frames from the test trajectory to speed up
subsequent test steps. With he default setting of step=5 this
will result in a 100 frame trajectory.

thinTraj.py -i traj.dat [-step |int|]

      i     - pickled Trajectory object
      step  - int, 1..keep all frames, 2..skip first and every second, ..

traj2ensemble.py:

Pool several trajectory objects to one ensemble trajectory.
Each sub-trajectory is considered as traj of one ensemble member.
This script is ignoring any profiles of the given trajectories and
re-assigns new frame names ( the Trajectory.concat() method is not
used to allow handling of larger trajectories )

traj2ensemble.py -i |in_traj1 in_traj2 ..| -o |out_traj|
                [-s |start_frame(incl.)| -e |end_frame(excl.)| -step |step|
                 -ref |ref_pdb_or_model| -pdb |PDBCode| -prot ]

    s,e,step - start, end position and stepping for each of the input traject.
    ref      - PDB or pickled PDBModel with reference coordinates, by default,
               the reference frame of the first trajectory is taken
    pdb      - PDB code to be stored in trajectory
    prot     - delete all non-protein atoms (not by default)

trajAddNames.py:

Add file names of frames to trajectory
trajAddNames.py -i |in_traj.| -o |out_traj.| -f |file1 file2 file..|

Add/replace file names of frames to existing (pickled) Trajectory.

trajFluct.py:

trajFluct.py: Calculate global and side chain fluctuation per atom
              for a trajectory.

Syntax:     trajFluct -i trajectory_file [-o result_trajectory]

Options:   -i     pickled trajectory
           -o     file name for pickled result Trajectory

trajpool2ensemble.py:

Convert one normal Trajectory into EnsembleTraj. The input trajectory must
have frame names that allow sorting by time and ensemble member (see
EnsembleTraj.py for details).

traj2ensemble.py -i |in_traj| -n |n_members| -o |out_traj| -pdb |PDBCode| ]

    o        - out file name        (default: replace input file)
    n        - number of ensemble members to expect (default: 10)
    pdb      - PDB code to be stored in trajectory

Biskit Setup

back to Content

setup_biskit.py:
```
setup the biskit environment
```

setup_hosts.py:

setup_hosts.py: Setup the host list neded for Biskit distributed calculations.

Usage: Run this script once and let it create the empty host list in ~/.biskit/hosts.dat.
Add your avaliable hosts to the list. There are three different sections
to which you can choose to add a host:
- own_hosts: omputers reserved for own use, highest priority
- shared_hosts: computers shared with others, medium priority
- others_hosts: computers mainly used by others, lowest priority
Add your hosts to the corresponding 'dual' or 'single' cpu option.
Separat the different hosts with a blank space. If you whish to
temporarily exclude a host from being used, add it to the 'exclude' option

Optional settings (noce and ram):
The nice settings can be changed for a specific computer (default values
are 0 for 'own' and 'shared' and 17 for 'others'). To add nice value
add the host(s) and the nice value separated by a colon (:) to the
'option' 'nice'. Separate multiple hosts with a blank space.
Example: computer1.local.net:12 computer2:8 computer3.local.net:5
In the same way avaliablbe RAM in GB can be added. The default values
here are 0.5 for a single cpu machine and 1.0 GB for a dual cpu machine.

Syntax: setup_hosts -i |list, input file| -w |str, out file|

Structure manipulation

back to Content

1pdb2model.py:

Syntax: 1pdb2model.py -i |file1| [-o |outfile| -psf |psf_file| -wat -amber
                      -pdb |PDBCode| ]

Result: self-sufficient pickled PDBModel or PCRModel, with itself as source

Options:
    -i      input PDB or pickled PDBModel
    -psf    psf file name -> will generate PCRModel instead
    -o      output file name (default: pdbfile.model)
    -wat    skip water residues (WAT TIP3 WWW H2O) and Cl-, Na+
    -amber  rename CYX -> CYS, HID/HIE/HIP -> HIS
    -pdb    pdb code to be stored in model.pdbCode

averageASA.py:

averageASA.py - a script that collects the average (of 500 structures)
                 molecular surface (MS) and solvent accessible surface
                 (AS) for all 20 amino acids in a GLY-XXX-GLY tripeptide.

Syntax:  AverageASA.py -i |template.pdb| -r |path|
                          [ -l |str| -mask |resmask|]

Options: -i     file, pdb tripeptide template file
         -r     path, calculation root folder (many directories with be
                   created in this folder)
         -l     str, label for the three result dictionaries
         -mask  residue mask to delete padding residues (i.e GLY)

Result:  4 dictionaries AS, AS_sd, MS and MS_sd, written to root folder

castPdbs.py:

castPdbs: Convert two similar PDBs in two PDBs with equal atom content.
PDBs must not have any HETATOMs. TIP3 residues are removed.

Syntax: castPdbs.py -i1 |pdb1| -i2 |pdb2| -o1 |outFile1| -o2 |outFile2|
                    [ -c1 |0 1 ..| -c2 |2 3 ..| ]

i1, i2   file names of PDBs to be compared
o1, o2   file names for result pdbs
c1, c2   chain numbers (starting 0) to take from i1 and i2 (default: all)

dope.py:

Syntax:    dope.py -s sourceModel -i otherModels
                  [-so sourceOut -o othersPrefix -dic old_model_dic ]

Add conservation, accessibility profiles and foldX energies to a reference
model and models linking to this reference.

1) if sourceOut is given: Remove waters from source, put conservation score
   into profiles, saveAs sourceOut
2) update each of |otherModels| from their respective source, make |sourceOut|
   their new source, remove atoms (shouldn't be changed from |sourceOut|) and
   pickle them down to same file name plus |othersPrefix| if given.
3) update old model dic if given

Example 1:
        dope.py -s ../../rec_wet/1B39.pdb -so ../../rec_wet/dry.model \
        -i *.model -dic 1B39_model.dic
        -> create source and update model.dic

Example 2:
        dope.py -s ../../rec_wet/dry.model \
        -i *.model -dic 1B39_model.dic
        -> source already there, update model.dic

getSS.py:

Count number of SS bonds in protein.
Syntax: getSS.py |input1.pdb| |input_2.pdb| ..

model2pdb.py:

Convert a pickled PDBModel into a PDB file.

Syntax: model2pdb.py -i |infile| -o |outfile| [ -wat -ter 0|1|2 -codeprefix ]

Options:
    -i      one or more pickled PDBModel(s)
    -o      output file name (default: infile.pdb ) (ignored if >1 input file)
    -wat    skip water residues (WAT TIP3 WWW H2O) and Cl-, Na+
    -ter 0  don't write any TER statements
    -ter 1  try restoring original TER statements
    -ter 2  put TER between all detected chains
    -codeprefix  add model's pdbCode entry as prefix to out file name

pdb2model.py:

Syntax: pdbs2struct.py -i |file1 file2 ..| [-h |host| -c |chunk| -a -w
                       -o |other_outFolder| -wat -s]

    pvm must be running on the local machine!

Result: pickled PDBModel object for each pdb file with same file name
        but ending in '.model'

Options:
    -h    number of hosts to be used
    -a    first add hosts to pvm
    -c    chunk size, number of pdb's passed to each node at once
    -w    display a xterm window for each node
    -o    destination folder (default: same where pdb file comes from)
    -wat  skip water residues (WAT TIP3 WWW H2O)
    -amber  rename CYX -> CYS, HID/HIE/HIP -> HIS, unwrap atom names
          (this creates models with the same atom/res names as pdbs created
           with ambpdb -p top.parm -aatm -bres < some.crd > some.pdb )
    -s    sort atoms alphabetically within residues

pdb2seq.py:

Extract AA sequence from PDB.

Syntax pdb2seq.py |pdb_file|

pdb2traj.py:

pdb2traj.py: Collect many coordinate frames ( pdb or pickled PDBModel ) of one
             molecule. Write Trajectory object. Waters are removed.

Syntax:    pdb2traj -i pdb1 pdb2 ..  [ -e -r |ref_structure| -o |out_file| -f -wat -c ]
    OR:    pdb2traj -d folder/with/pdbs/or/models [ -r ... ]

Options:   -i     input pdb files or pickled PDBModel objects
           -d     folder containing input pdb or pickled PDBModel files
           -e     create EnsembleTraj, input files must be ordered first
                  by time then by member; x_30_10.pdb sorts before x_100_09.pdb
           -r     reference structure with PDB records (incl. waters),
                  if not given, the first file from -i is used
           -wat   delete TIP3, HOH, Cl-, Na+ from ref and all frames
           -o     file name for resulting pickled Trajectory object
           -f     fit to reference (dry reference if given)
           -c     analyze atom content of all files seperately before casting
                  them to reference. Default: only analyze first file in -i.

Note about reference structure: The atom order and content of the files given
with -i is adapted to the order/content of the reference PDB but NOT
vice-versa. Snapshots can hence have additional atoms (which are removed) but
they must have, at least, all the atoms that are in the reference.

pdb2xplor.py:

The xplor input file will be assembled from 3 template files. The template
files should be independent of the particular PDB and should instead contain
place holders which pdb2xplor will replace by actual file names, numbers, etc.
Place holders look like that:
     %(segment_pdb)s  .. means, insert value of variable segment_id as string
All the variables of the Xplor class (see Xplor.__init__()) can be adressed
this way. The available variables are listed in the log file. Some variables,
like segment_pdb, amber_patch, segment_id will only have meaningfull values in
a segment template.

pdb2xplor combines the templates as follows:

one header_template
+ (one segment template per segment)
+ disulfide patches (generated without template)
+ one tail template

the most relevant variables are:

for header:   project_root .. root folder of cvs project
for segment:  segment_id   .. segid of currently processed segment
              segment_pdb  .. file name of segment pdb (generated)
              amber_patch  .. terminal patches for amber ff (generated)
for tail:     pdbcode      .. first 4 letters of input pdb file name
              outname      .. suggested file name for output pdb and psf
                              (with absolute path but w/o '.pdb' or '.psf')
              path         .. output path (specified with option -o)

For hackers:
All command line options are also available as variables (e.g. i, o, t).
Even more, you can invent any command line option (e.g. -temperature 298)
which will then be available as variable. Taken the example you could
add a place holder %(temperature)i to your template file which would be
translated to 298 (or whatever you specify).

For hackers++:
With option -x you can specify a file containing variable - value pairs ala:
temperature    298   # the temperature in K
steps 100            !! minimization steps

Give one pair per line, only the first 2 words are parsed.

scripts/Dock

protein-protein docking scripts -- back to Content

PCR2hex.py:

pcr2hex  pool many pdb's into one seperated by MODEL/ENDMDL to be used by hex.

Syntax:  pcr2hex -psf |in.psf| -pdb |in1.pdb| |in2.pdb| ... [-s |modelFolder| ]

         -psf     psf file name
         -pdb     list of pdb file names
         -nos     don't pickle each PDB as pickled w/o waters to this folder

Result:  -pdb file, pdbCode_hex.pdb (first 4 characters of the first pdb file
          are taken as pdbCode)
         -model dictionary, pdbCode_models.dic
         -modelFolder/in1.model, in2.model, unless -nos has been given

concat_complexLists.py:

concat_complexLists.py -i complexes1.cl complexes2.cl -o complexes_out.cl
                       -mo out_folder_for_changed_models
                       -rdic correct_rec.dic -ldic correct_lig_models.dic

contacter.py:

contacter: Take ComplexList, calculate contactMatrix and other stuff
           for all complexes on several nodes. Pickle ComplexList to a file.
           The result values are put into the info dict of each complex.

Syntax:        contacter [-i |complex_lst| -o |file_complex_lst|
                      -c |chunk_value| -ref |ref_complex| -v |complex_version|
                      -a -h |n_hosts| -u
                      -f |name| -s | -n |min_nice| -all -e |host1 host2..|]

Options:   -i     pickeled list of Complex objects (file name)
           -o     file name for pickled complex dictionary
           -c     chunk size (number of complexes passed to each node)
           -a     add hosts to pvm
           -h     number of nodes to use (default: all available)
           -e     exclude hosts
           -ref   pickled reference Complex for fraction of native Contacts
           -w     show xterm for each node (default: off)
           -u     only fill empty info fields, or missing keys from -f
           -f     force calculation on sub-set of measures, current measures:
                     'fnrc_4.5', 'fnac_10', 'fnac_4.5',
                     'fnarc_9', 'fnarc_10', 'c_ratom_9', 'c_ratom_10',
                     'eProsa', 'ePairScore', 'foldX',
                     'cons_ent', 'cons_max', 'cons_abs'
                     'rms_if', 'rms_if_bb', 'xplorEnergy'
           -v     work on a previous version of each complex, only valid if
                  input is ComplexEvolvingList (e.g. status before and after
                  refinement). 0..use oldest version, -1..use latest version
           -n     renice calc to, at least, this nice value
           -s     splits complex list into sublists of this size, dumps temporary
                  contacted lists, collects result (can be resumed)
           -all   allow more than 512 solutions per model pair (keep all)

hex2complex.py:

hex2complex:    Parse output file from hex docking run, create dictionary of
                Complex(es), and pickle it to a file.
                Creates a plot of the cluster distribution (using hex rmsd).

                rec, lig  - receptor and ligang model dictionary
                hex       - output file from hex
                o         - name of resulting complex list
                p         - create biggles plot of rmsd vs. solution
                mac       - force rec and lig 'model 1' to be used for all

Syntax:    hex2complex -rec |models_rec| -lig |models_lig| -hex |hex.out|
                       -o |output name| -p |create plot|

Example:    hex2complex -rec 1BZY_models.dic -lig 2AKZ_models.dic
                        -hex 1BZY_2AKZ_hex.out -o complexes.cl -p

hexInput.py:

hexInput    Create a macro file for hex.
Syntax      hexInput -r |rec pdb| -l |lig pdb|
                     [-c |com pdb| -rm |rec model| -lm |lig model|]

              r, l   - pdb file in hex format (single or multi model)
              rm, lm - model number to use,
                       if not given perform multi model docking
              c      - a reference complex pdb file (for rmsd output)
              sol    - number of solutions to save

Result      Hex macro file

hexResults.py:

hexResults   Get info about docking results from one or more complexGroup files.
Syntax       hexResult -cg |complexGroup.cg| [ -p |plot name| -o |file name| ]

Result       Plot and report file

inspectComplexList.py:

Check info dict of ComplexList for missing values.
Syntax: checkComplexes.py |complex_cont.cl|

multidock.py:

Seperately dock several receptor models against several ligand models.

multidock.py -rdic |rec_model.dic| -ldic |lig_model.dic|
            [-rpdb |rec_hex.pdb| -lpdb |lig_hex.pdb| -com |refcomplex_hex.pdb|
             -out |outfolder| -e |excludeHost1 excludeHost2..| mac |1 or 0|
             -rid |A A ..| -lid |B| -soln |int|]

         rdic, ldic  .. dict with PCRModels indexed by 1 .. n (rec, lig)
         rpdb, lpdb  .. HEX-formatted PDB with same models (rec, lig)
         com         .. HEX-formatted PDB with reference complex
         out .. folder for results (created), may contain %s for date
         e   .. dont use these hosts
         mac .. 1|0 force the use of macro docking, if not given, the size
                of the receptor will decide if macro docking is used.
         rid,lid .. force these chain IDs into HEX PDB file of rec / lig
         soln    .. number of solutions to keep from each docking

pdb2complex.py:

pdb2complex.py  - create a reference Complex (without waters)

Syntax:  pdb2complex.py  -c |complex pdb|
                         -r |chain index| -l |chain index|
                         -o |output name|

Options:   -c     complex pdb file or pickled PDBModel object
           -r     receptor chain list (e.g. 0 1 )
           -l     ligand      ~       (e.g. 2 )
           -o     output file
           -lo,lr ligand, receptor model output file

reduceComplexList.py:

Reduce macro docked complex list (rec * lib * 5120)
to a normal complex list (rec * lig * 512)

   i - complexList to be reduced
   o - name of reduced list

selectModels.py:

selectModels: Select non-redundant frames from a trajectory dump them and put
              them into a PDB file for HEX docking.

Syntax:  selectModels -i |traj.dat| -o |out_folder| [ -psf |psf_file|
                      -dic |out_model_dic|
                      -n |number| -ref
                      -co |TrajCluster.out| -a |atom1 atom2 atom..|
                      -s |startFrame| -e |endFrame| -step |frameSkip|
                      -id |chaiID]
                      -conv |convergence_diff| ]

         i    - pickled Trajectory object
         dic  - alternative name for model.dic
         psf  - create PCRModels with psf file info
         ref  - add trajectory's reference model to dictionary and pdb if
                  a reference pdb file is given this will be used insted
         id   - set ligand and receptor chainID
         a    - atoms to use for clustering,
                default: C and roughly every second side chain heavy atom
         conv - float, convergence criterium [1e-11]

Result:  - n pickled PDBModels '|PDBCode|_|frame|.model' in out_folder
         - pickled TrajCluster if requested
         - |PDBCode|_model.dic with these n PDBModels indexed from 1 to n

scripts/Mod

homology modeling scripts -- back to Content

align.py:

Syntax: align.py [ -o |outFolder| -log |logFile| -h |host_computer| ]

Options:
    -o       output folder for results      (default: .)
    -log     log file                       (default: STDOUT)
    -h       host computer for calculation  (default: local computer)
             -> must be accessible w/o password via ssh, check!
    -? or help .. this help screen

align_parallel.py:

Built multiple alignment for each project given directory (parallelised).
If run from within a standardized modeling/validation folder structure,
i.e from the project root where the folders templates, sequences, and
validation reside all options will be set by the script.

Syntax: align_parallel.py -d |list of folders| -h |hosts|
                         [-pdb |pdbFolder| -ft |fastaTemplates|
                          -fs |fastaSequences| -fta |fastaTarget|
                          -fe |ferror|]

Note:  pvm must be running on the local machine!

Options:
    -d    [str], list of project directory (full path)
    -h    int, number of hosts to be used
    -a    first add hosts to pvm
    -pdb  str, pdbFolder for the pdb *.alpha
    -ft   str, path to 'templates.fasta'
    -fs   str, path to 'nr.fasta'
    -fta  str, path to 'target.fasta'
    -fe   str, path to the error file for the AlignerMaster

analyse.py:

Syntax: analyse.py -d |main project folder| [-s |1||0] ]

Result: Performing model analysis for each main project folder given.
        Outputs a folder 'analyse' containing:

        * analyse/global_results.out
          various data about the model, see file header.

        * analyse/local_results.out:
          residue rmsd profile to taget and mean rmsd to tagret

        * modeller/final.pdb:
        the 'best' model with the mean residue rmsd in the B-factor column


Options:
        -d          [str], list of project directory
        -s          show the structure final.pdb im PyMol

benchmark.py:

Syntax: benchmark.py -d |list of folders|
                     [ -modlist |model_list| -ref |reference|]

Result: Performing various benchmark tasks for each folder given.
        A folder validation/benchmark containing:

        * validation/????/benchmark/Fitted_??.pdb:
        Benchmark model iteratively superimposed on its known structure.

        * validation/????/benchmark/rmsd_aa.out:
        All-atom rmsd of the benchamark modela. (1) without iterative fitting,
        (2) with iterative fitting and (3) the percentage of atoms that has
        been removed during the iterative fitting.

        * validation/????/benchmark/rmsd_ca.out:
        same as above, but only for C-alpha atoms

        * validation/????/benchmark/rmsd_res_??:
        gives the C-alpha rmsd for each residue.

        * validation/????/benchmark/PDBModels.list:
        pickled PYTHON list of PDBModels. Each model contains
        the benchmark information in the atom and residue profiles:
        'rmsd_aa', 'rmsd_ca', 'rmsd_res'. See PDBModel.profile()!


Options:
    -d          [str], list of project validation directories
    -modlist    str, the path to the 'PDBModels.list' from the
                  project directory
    -ref        str, the path to the 'reference.pdb' from
                  the project directory (known structure)

clean_templates.py:

Syntax: clean_templates.py [-o |output_folder| -i |chainIndex| -log |logFile|

input: templates/nr/*.pdb
       templates/nr/chain_index.txt

output: templates/t_coffee/*.alpha    (input for Alignar)
        templates/modeller/*.pdb      (input for Modeller)

Options:
    -o       output folder for results      (default: .)
    -i       chain index file for templates
                 (default: '/templates/nr/chain_index.txt')
    -log     log file                       (default: STDOUT)

model.py:

Build model using Modeller.

Syntax: model.py [ -o |outFolder| -log |logFile| -h |host_computer| ]

Options:
    -o       output folder for results      (default: .)
    -log     log file                       (default: STDOUT)
    -h       host computer for calculation  (default: local computer)
             -> must be accessible w/o password via ssh, check!
    -s       show structures on Pymol superimposed on average
    -? or help .. this help screen

input: templates/modeller/*.pdb
       t_coffee/final.pir_aln

output: modeller/modeller.log
        /*.B9999000??   <- models

model_parallel.py:

Syntax: model_parallel.py -d |list of folders| -h |host|
                       [-fta |fastaTarget| -pir |f_pir|
                       -tf |template_folder| -sm |starting_model|
                       -em |ending_model| -fe |ferror|]

    pvm must be running on the local machine!

Result: Parallel modelling for each project directory given

Options:
        -d    [str], project directories  (default: ./validation/*)
        -h    int, number of hosts to be used  (default: 10)
        -fta  str, path to find 'target.fasta'
        -pir  str, alignment filename
        -tf   str, directories for input atom files
        -sm   int, index of the first model
        -em   int, index of the last model
        -fe   str, filename to output errors from the Slave

modelling_example.py:

Biskit.Mod example script that models a structure from a fasta
formated sequence file in 4 steps:

1) Searches for homologe sequences and clusters the found
   sequences to a representative set using NCBI-Tools.
2) Searches for temptale structures for the homology modeling.
   Similar structures are removed by clustering.
3) Build a combined sequence/structure alignment using T-Coffee.
4) Build models using Modeller.

Syntax: modelling_example.py -q |query file| -o |outputFolder|
                            [-h |host| -log  -view ]

Options:
   -q     file; fasta formated sequence file to model
   -o     folder; directory in which all project files will be
            written
   -h     host name; the quite cpu consuming stasks of aligning
            and modeling can be sent to a remote host that also
            has access to the output directory
   -log   write stdOut messages to log file (~project/modelling.log)
   -view  show the superimposed models in PyMol


HINT: If you want to inspect the alignment used for modeling:
      ~project/t_coffee/final.score_html

model_for_docking.py:
```
another modelling example
```

search_sequences.py:

Syntax: search_sequences.py [-q |target.fasta| -o |outFolder| -log |logFile|
               -db |database| -limit |max_clusters| -e |e-value-cutoff|
               -aln |n_alignments| -psi |psi-blast rounds|
               -... additional options for blastall (see SequenceSearcher.py) ]

Result: folder 'sequences' with files:
        - blast.out - result from blast search (all alignments)
        - cluster_blast.out - blast alignments of cluster sequences
        - cluster_result.out - clustering output
        - all.fasta - all found sequences in fasta format
        - nr.fasta - clustered sequences in fasta format

Options:
    -q       fasta file with query sequence (default: ./target.fasta)
    -o       output folder for results      (default: .)
    -log     log file                       (default: STDOUT)
    -db      sequence data base
    -limit   Largest number of clusters allowed
    -e       E-value cutoff for sequence search
    -aln     number of alignments to be returned
    -simcut  similarity threshold for blastclust (score < 3 or % identity)
    -simlen  length threshold for clustering
    -ncpu    number of CPUs for clustering
    -psi     int, use PSI Blast with specified number of iterations

search_templates.py:

Syntax: search_templates.py [-q |target.fasta| -o |outFolder| -log |logFile|
               -db |database| -e |e-value-cutoff|  -limit |max_clusters|
               -aln |n_alignments| -psi
               -... additional options for blastall (see SequenceSearcher.py) ]

Options:
    -q       fasta file with query sequence (default: ./target.fasta)
    -o       output folder for results      (default: .)
    -log     log file                       (default: STDOUT)
    -db      sequence data base
    -limit   Largest number of clusters allowed
    -e       E-value cutoff for sequence search
    -aln     number of alignments to be returned
    -simcut  similarity threshold for blastclust (score < 3 or % identity)
    -simlen  length threshold for clustering
    -ncpu    number of CPUs for clustering
    -psi     use PSI Blast instead, experimental!!

setup_validation.py:

Setup the cross-validation folder for one or several projects

Syntax: setup_validation.py [ -o |project folder(s)| ]

Options:
    -o          .. one or several project folders (default: current)
    -? or -help .. this help screen

scripts/analysis

scripts for analysis and visualisation of results -- back to Content

a_baharEntropy.py:

a_baharEntropy.py -i |com_folder1 com_folder2 com_folder3..|

com_folder must contain com_wet/dry_com.model,
                        rec_wet/dry_rec.model,
                        lig_wet/dry_lig.model
                        and
                        analysis_500-5.opt for -cl and -cr option
E.g: a_baharEntropy.py -i c11 c12 c13 > result.txt 2> log.txt

a_baharFluct.py:
```
no documentation
```

a_cad.py:

CAD (contact area difference) calculation by icmbrowser.
The calculation is performed only for residues that are in contact
in the reference complex.

    cl  - complexList, has to contain info dictionary data for key
    ref - reference complex

a_comEntropy.py:

Run many AmberEntropist calculations on many nodes. The Master has
a standard set of 13 protocols to run on rec, lig, and com
trajectories, as well as on every single member trajectory - in
total 113.  It accepts one variable parameter, e.g. s(tart). Each
protocol is then run for all values of the variable parameter.
The script puts many temporary trajectories into the folder with the
input trajectories -- consider creating a new folder for each trajectory!

Syntax:  a_comEntropy.py -rec |rec.traj| -lig |lig.traj| -com |com.traj|
                         -out |out.dat| [ -cr |rec_chains| -zfilter |cutoff|
                         -s |from| -e |to| -ss |from| -se |to|
                         -thin |fraction| -step |offset|
                         -var |option| -vrange |v1 v2..| -jack
                         -exrec |members| -exlig |members| -excom |members|
                         -hosts |name1 name2..| -clean  -single ]

Options:
    rec    - str, free rec trajectory
    lig    - str, free lig trajectory
    com    - str, complex trajectory
    out    - str, file name for pickled result
    cr     - [int], chains of receptor in complex trajectory [n_chains rec]
    var    - str, name of variable option [ s ]
    vrange - [any], set of values used for variable option
             OR 'start:stop:step' i.e string convertable to arange() input
    jack   - set up leave-one-trajectory-out jackknife test [don't]
             (replaces var with 'ex1' and vrange with range(1,n_members+1))

    zfilter- float, kick out outlyer trajectories using z-score threshold
             [None->don't]
    exrec  - [int], exclude certain members of receptor ensemble    [[]]
    exlig  - [int], exclude certain members of ligand  ensemble     [[]]
    excom  - [int], exclude certain members of complex ensemble     [[]]

    clean  - remove pickled ref models and member trajectories [0]
    hosts  - [str], nodes to be used [all known]
    h      - int, number of nodes to be used from all known [all]
    single - run only one job on multi-processor nodes [0]
    mem    - float, run only on machines with more than |mem| GB RAM [0]
    debug  - don't delete output files [0]

    ... parameters for AmberEntropist -- can also be given as -var
    cast    - equalize free and bound atom content [1]
    s,e     - int, start and stop frame                 [0, to end]
    ss, se  - int, start and stop frame of single member trajectories
              (only works with EnsembleTraj; overrides s,e)
    atoms   - [ str ], names of atoms to consider       [all]
    heavy   - delete all hydrogens                      [don't]
    step    - int, frame offset                         [no offset]
    thin    - float, use randomly distributed fraction of frames [all]
              (similar to step but sometimes better)
    all     - only calculate with all members, no single member values
    ex      - [int] exclude same members from rec, lig and com
    ex_n    - int, exclude last n members  OR...                 [0]
    ex3     - int, exclude |ex3|rd tripple of trajectories       [0]
              (0 excludes nothing, 1 excludes [0,1,2] )
    ex1     - int, exclude ex1-th member remaining after applying ex [None]
              (0 excludes nothing, 1 excludes [0] )

    ... parameters for AmberCrdEntropist, Executor, Master
    f_template - str, alternative ptraj input template  [default]
    verbose    - print progress messages to log     [log != STDOUT]
    w          - show x-windows   [no]
    a          - 0|1, add hosts to PVM [1]

a_compare_rms_vs_fnc.py:

a_compare_rms_vs_fnc.py: Plot interface rmsd (heavy and/or backbone) vs.
                           fraction native atom/residue contacts at
                           different cutoffs.

  creates up to 4 plots: rms_if_vs_cont.eps
                         rms_if_bb_vs_cont.eps
                         rms_if_bb_vs_rms_if.eps
                         rms_hex_vs_rms_if.eps


Syntax:             -i  complexes_cont.cl
                -o  str, output directory
                -v  [str], list of keys to plot
                -if     1||0 create plot of key vs. interface rmsd
                -if_bb  1||0 create plot of key vs. interface backbone rmsd

Abbreviations: fnac  - Fraction of Native Atom Contacts
               fnrc  - Fraction of Native Residue Contacts
               fnarc - fnac with Reduced atom models

a_dasa.py:

Calculate change in access. and molecular surface upon binding.
Syntax:   a_dasa.py -r |rec_model| -l |lig_model| -c |com_model|

a_ensemble.py:

Analyze ensemble MD.
Syntax:  a_ensemble.py -i traj.dat [ -z |outlier-z-score| -step |offset|
                       -o |out.eps| -title |plot_title| ]

a_foldX.py:

Syntax:  a_foldX   -c |complexes.cl| -o |out_folder| -ref |ref.complex|

a_model_rms.py:

load model dictionary and report
  * average pairwise rmsd
  * average rmsd to free structure (assumed to be model 1)
  * average rmsd to bound structure
 model 1 is not included in the calculations

a_multiDock.py:

a_multiDock  Visualize multidock results
Note: interface rms values are for contact atoms not contact residues.

Syntax       a_multiDock -cl |complexList.cl|

             cl - complexList, has to contain info dictionary data for key
             r  - hex receptor pdbs (i.e rec/*_hex.pdb)
             l  - hex ligand pdbs (i.e lig/*_hex.pdb)
             ref - reference complex
             key - info dictionary key to plot (high values are considered good)
             inv - 1||0 inverse data associated with key (i.e. for rmds plots)
             maxContour - scale contour circles to fit at most x solutions
             additional_profile - add to profile plot (rec_model lig_model)

Result       5 plots, info txt file, dumped data

a_multidock_contour.py:

a_multiDock  Visualize multidock results
Syntax       a_multiDock -cl |complexList.cl|

             cl         - complexList, has to contain info dictionary data for key
             inv        - 1||0 inverse data associated with key (i.e. for rmds plots)
             maxContour - scale contour circles to fit at most x solutions

a_random_contacting.py:

a_random_contacting.py -i 1.cl 2.cl 3.cl .. -ref ref.complex
                       -nat natively_contacted.cl [ -t ]
                       [ -nout summary_output_file
                         -rout random_output_file ]

Get confidence of native scores from several scores to random reference.

Prints table 3 of multidock paper, i.e. fnac, score, rms, ..
of free vs. free docking, the docking with highest fnac, and the docking
with highest score - FOR EACH complex list. The ref.complex is used to
calculate the interface rmsd to the bound.
Calculates averages and confidence of highest score, rms..
The 'real' table 3 line for the native is appended to a separate file.

t   .. print table header
nout .. append line with native fnac, scores,.. and confidence to this file
rout .. append lines with random fnac, scores,.. to this file [STDOUT]

a_report_comEntropy.py:

Analyze the result of a_comEntropy.py

Syntax:  a_report_comEntropy.py -i |result.dic| [-eps |file| -tall -tsd -t
                                             -prefix |str|]
Options:
  -i      dictionary from a_comEntropy.py
  -eps    output plot with entropies for all values of variable parameter
  -tall   print table with all entropy values
  -tsd    print single line with entropy values and standard dev
          (var=='ex3')
  -t      print header row of table
  -prefix prefix for tables (e.g. a01)

a_rmsd_vs_dock_performance.py:

Collect, calculate and save to disc (both as text files and pickled
dictionaries) various data about a complex list. This script is written
to collect data from multidocking runs and assums that the first
ligand and receptor model is the free xray structure.

Syntax:   a_rmsd_vs_dock_performance.py -cl |complexList.cl|
                         -ref |ref.complex| [-key |str| -inv [int|]

      cl  - complexList, has to contain info dictionary data for key
      ref - reference complex
      key - info dictionary key to plot (high values are considered good)
      inv - 1||0 inverse data associated with key (i.e. for rmds plots)

Output:   An output directory 'model_vs_rmsd' is created and various text
          files and corresponding dictionaries are written to it.

a_rmsd_vs_performance.py:

Create figure for multidock paper with the change in docking performance
plotted against the change in rms_to_bound (both relative to the free-free
docking).

k .. PDBModel.info key to be plotted (rms_to_bound, rmsCA_to_bound, ..)
o .. file name of output eps
i .. one or many complexists with

a_table_fnac_rms_score.py:

a_table_fnac_rms_score.py [ -i complexes_cont.cl -ref ref.complex -t ]

Creates one line of table 3 of multidock paper, i.e. fnac, score, rms, ..
of free vs. free docking, the docking with highest fnac, and the docking
with highest score.
t .. print table header

a_trajEntropy.py:

Analyze entropy of a single or one rec and one lig trajectory with ptraj.

Syntax:  a_trajEntropy.py -i |traj1.dat+traj2.dat| [ -o |result.dic|
                      -ref |ref_structure| -cast -chains |chain_indices|
                      -border |chain| -split -shift -shuffle
                      -s |startFrame| -e |endFrame| -step |frameOffset|
                      -ss |member_startFrame| -se |member_endFrame|
                      -ex_n |exclude_n_members|
                      -ex1 |ex_from_traj1| -ex2 |ex_from_traj2|
                      -ex3 |exclude_member_tripple|
                      -atoms |CA CB ..| -heavy
                      -nice |level|
                      -parm |parm_file| -crd |crd_file| -f_out |ptraj_out|
                      -f_template |ptraj_template|
                      -log |log_file| -debug -verbose ]
Options:
    i          1 trajectory or 2 trajectories connected by '+'
    o          file name for pickled result dictionary
    s          skip first |s| frames (of complete trajectory)
    e          skip frames after |e| (of complete trajectory)
    ss         skip first |ss| frames of each member trajectory
    se         skip frames after |se| of each member trajectory
    atoms      considered atoms (default: all)
    heavy      remove hydrogens (default: don't)
    ref        pickled PDBModel, Complex, or Trajectory
    cast       equalize atom content of traj and ref         [no]
    chains     list of integer chain indices e.g -chains 0 1 [all]
    border     1st chain of 2nd molecule for -split, -shift, -shuffle
    split      split complex trajectory and fit rec and lig separately
               (requires -border with first lig chain)       [no]
    shuffle    shuffle the order of rec vs. lig frames
    thin       use randomly distributed fraction of frames, e.g. 0.2 [all]
    step       frame offset, use every step frame, e.g. 5            [all]
    ex1        exclude these members from 1st trajectory, e.g. 3 6
    ex2        exclude these members from 2nd trajectory (if given)
    ex_n       exclude first n members                       [0]
    ex3        exclude |ex3|rd tripple of members, e.g. 2 excludes 3,4,5
               (0 excludes nothing)                          [0]

    f_template alternative ptraj input template [default template]
    f_out      target name for ptraj output file        [discard]
    nice       nice level                                     [0]
    log        file for program log                       [STOUT]
    debug      keep all temporary files
    verbose    print extended progress messages to log    [log != STDOUT]

a_trajQuality.py:

Syntax: a_trajQuality -i |traj_1 traj_2 .. traj_n| [-a -h |n_hosts| -w]
    pvm must be running on the local machine!

Result: eps with quality plots in folder of traj files

Options:
    -h    number of hosts to be used
    -a    first add hosts to pvm
    -w    display a xterm window for each node

random_complexes.py:

random_complexes.py -r |rec_model| -l |lig_model| [ -o |out_file| -n |number|
                    -traj |traj_out_name| -ro |rec_out| -lo |lig_out|
                    -debug copy_inp_file ]

Remark:
to create valid PCRModels for rec and lig use (in rec_wet/, lig_wet/)
        1pdb2model.py -i ????.pdb -psf ????.psf -o xplor.model
The waters in the PSF are deleted. They can be in the model but don't have to.

Options:
     r     pickled PCRModel, receptor (psf file has to be valid)
     l     pickled PCRModel, ligand   (psf file has to be valid)
     ro    file name for rec copy (centered and no waters)
     lo    file name for lig copy (centered and no waters)
     o     file name for result ComplexList
     n     number of random complexes to generate
     traj  file name for optional amber crd and pdb (for visualisation)
     debug keep Xplor input file and don't delete temporary files

random_grouping.py:

select set of non-redundant, non-native random complexes

random_grouping.py -cl |complex_list|

Options:
      ref .. pickled native complex
      h   .. number of hosts
      co  .. folder name for result complexes
      o   .. base name for other result files
      a   .. add hosts to PVM before starting

Personal tools

Overview of all scripts

Content

scripts/Biskit

Convenience and management

Structure sampling

Biskit Setup

Structure manipulation

scripts/Dock

scripts/Mod

scripts/analysis