Flexible docking

How to use the Biskit.Dock flexible docking workflow.

This text file describes a flexible docking run starting from a free receptor and a free ligand pdb-file we generate a ensemble of receptor and ligand conformers. We then dock the receptor and ligand conformers against each other, later referred to as cross-docking. The next step is to collect the docking result into a complex list and then cluster the solutions based on intermolecular atom contacts. During the docking it is possible to use the known structure of the bound complex for benchmarking purposes.

The structure ensembles are generated using principal component restraind molecular dynamics (PCR-MD). To run this you need a modified version of XPLOR that is available upon request.

Two other files that might be worth having a closer look at are the test scripts for regular docking: test_docking and the corresponding file for flexible docking: test_multidock. The latter describing the most simple flexible docking possible, namely that of two ligand conformers against a single receptor conformer (see Grunberg, Leckner, Nilges (2004) Structure Dec;12(12):2095-6 for some larger cross-dockings)

Setup a directory structure for the docking
Starting structures
Preparing the structures
Generate a ligand ensemble
Generate a receptor ensemble
Collect the ligand simulation snapshots
Collect the receptor simulation snapshots
Cluster the trajectories
Prepare the complex structures for docking
Add additional info to the models
Run the flexible docking
Calculate complex list data
Check complex list data completeness
Analyze docking result

Setup a directory structure for the docking

Not to end up with a folder cluttered with tons of files a folder structure like the one below this is recommended for a docking project:

~/project_folder/rec
~/project_folder/rec_pcr_00
~/project_folder/lig
~/project_folder/lig_pcr_00
~/project_folder/com
~/project_folder/multidock
~/project_folder/multidock/rec
~/project_folder/multidock/lig
~/project_folder/multidock/com
~/project_folder/multidock/hex

Starting structures

You will need a receptor and a ligand structure and optionally a complex structure. So, the first task is to retrieve receptor and ligand coordinate files of the free (unbound) form (currently only pdb format supported).

Again, just like in the normal docking example, lets say our starting structures are called:

1lig_original.pdb
1rec_original.pdb
1com_original.pdb (optional)

Place the files in the corresponding rec, lig or com folder.

Preparing the structures

Prepare your structures just like in the regular docking howto.

Now run the pdb2xplor.py script for all structures:

>>> cd rec
>>> pdb2xplor.py -i 1rec_original.pdb -exe

>>> cd ../lig
>>> pdb2xplor.py -i 1lig_original.pdb -c 1 -exe

>>> cd ../com
>>> pdb2xplor.py -i 1com_original.pdb

You have now three cleaned and minimized structures with corresponding psf-files:

rec/1REC.pdb  and  1REC.psf
lig/1LIG.pdb  and  1LIG.psf
com/1COM.pdb  and  1COM.psf

Generate a ligand ensemble

This example starts a simulation on localhost without any principal component restraints. The simulation will run 10 MD simulations in parallell (for greater ensemble diversity) and will result in 500 shapshots (separated by 1 ps, 500 x 2 fs time steps).

NOTE: This is a time consuming step! Depending on the size of your system and the hardware you are using the simulation will take anything from days to months.

The pcr force is here set to zero - i.e. this is a standard parallell MD simulation. If you would like an even greater diversity run the script with a -f argument greater than zero (hint: aim at something close to 0.2 times the number of amino acids).

Extracted from the runPCR.py help screen:

Syntax: runPCR -t |psfFolder| -h |host|
               [-f |Force| -r |resultFolder| -n |nice| -i |inpFolder| ]

Options:
       -f      force constant for PCR restraint
       -r      base folder for result (sub-folders will be created)
       -t      folder with topology (psf, pdb)
       -n      nice value
       -h      host computer (accessible via ssh)
       -i      folder with all input files, must contain restart_h2o.inp'
       -parm   folder with param19.* files

Default options:
       - f     0
       - i     ~/biskit/external/xplor/inpPcr
       - h     localhost
       - parm  ~/biskit/external/xplor/toppar
       - n     0
       - r     .
       - nstep         500

Now start the simulation. If you want to run it on another host than the localhost you need pasword free ssh access to that machine.

>>> runPcr.py -t lig -r lig_pcr_00 -h localhost

The script returns information about the simulation that is usefull if you need to stop or restart the simulation:

host: localhost
pid:  14500
nice: 0
binary:        /shared_bin/xplor/ifcxplor
result folder: /mnt/data/shared_bin/biskit/test/lig_pcr_00/pcr_00
topology:      /mnt/data/shared_bin/biskit/test/lig
force:         0

Generate a receptor ensemble

Do the same thing for the receptor as you did for the ligand under Generate a ligand ensemble.

>>> runPcr.py -t rec -r rec_pcr_00 -h localhost

Collect the ligand simulation snapshots

The following step is quite time consuming and has therefore been parallellized through PVM. The first step is thus to start the pvm deamon (if not already running).

>>> pvm

Create a model folder in the lig_pcr_00 folder

>>> cd ~/project_folder/lig_pcr_00
>>> mkdir model

pdb2model.py is a parallellized script that converts all the pdb snapshots into PDBModels. The script reads the pdb files from the lig_pcr_00 folder, strips of any water molecules and writes PDBModels into the model folder.

>>> pdb2model.py -i lig_pcr_00/*.pdb -a -wat -o model/

pdb2traj.py collects the PDBModels created by pdb2model.py and fits them all to a given reference structure, deletes the waters and writes a trajectory output file: traj.dat

>>> pdb2traj.py -i model/*.model -r ../lig/1LIG.pdb -wat -f

If you don't need all the trajectory frames can remove some of them from the trajectort. This will speed up the subsequent calculations in the example below we keep every 5th frame of the trajectory.

>>> thinTraj.py -i traj.dat -step 5

Collect the receptor simulation snapshots

Do the same thing for the receptor as you did for the ligand under Collect the ligand simulation snapshots.

>>> cd ~/project_folder/rec_pcr_00
>>> mkdir model
>>> pdb2model.py -i rec_pcr_00/*.pdb -a -wat -o model/
>>> pdb2traj.py -i model/*.model -r ../lig/1REC.pdb -wat -f

Cluster the trajectories

We don't want to use all 500 snapshots for the docking, rather we want to select representative structures from the ensemble. In this example we select 10 ligand conformers and we do this by means of fuzzy clustering, where each picked structure is a cluster center.

selectModels.py will cluster the trajectory snapshots in ten clusters. The structure closest to the center of each cluster will be written to disc as a PDBModel and a model dictionary.

The -ref option tells the script to add the reference structure of the trajectory being clustered to the model dictionary, thus we will end up with a model dictionary containing 11 structures (10 cluster centers and the minimized structure from Preparing the structures.

>>> cd ~/project_folder/multidock/lig
>>> selectModels.py -i ../../lig_pcr_00/traj.dat -psf ../../lig/1LIG.psf -n 10 -ref

and the same for the receptor:

>>> cd ~/project_folder/multidock/rec
>>> selectModels.py -i ../../rec_pcr_00/traj.dat -psf ../../rec/1REC.psf -n 10 -ref

Prepare the complex structures for docking

This step has already been explained thoroughly in the docking howto.

Use PCR2hex.py to convert the complex pdb files that can be used by HEX.

>>> cd ~/project_folder/multidock/com
>>> PCR2hex.py  -psf ../../com/1COM.psf -pdb ../../com/1COM.pdb

Add additional info to the models

Now well add structure related information to all models that will be used in the docking (acessible surface area, surface curvature, residue conservation etc.)

>>> cd ~/project_folder/multidock/lig
>>> dope.py -s ../../lig/1LIG.pdb  -so ../../lig/1LIG.model -i *.model  -dic 1LIG_model.dic

>>> cd ~/project_folder/multidock/rec
>>> dope.py -s ../../rec/1REC.pdb  -so ../../rec/1REC.model -i *.model  -dic 1REC_model.dic

>>> cd ~/project_folder/multidock/com
>>> dope.py -s ../../com/1COM.pdb  -so ../../com/1COM.model -i *.model  -dic 1COM_model.dic

Run the flexible docking

In this example we are docking 11 receptor structures against 11 ligand structures (i.e. 121 single docking runs) and we are using the known complex for benchmarking. The HEX macro files and the stdout from HEX will be written to disc in the output folder during docking. When the docking is done the macro files and the logfiles are compressed.

This starts the multidocking across the accessible hosts (defined in ~/.biskit/hosts.dat). The output will be a complex list with 11x11x512 docking solutions.

>>> multidock.py -rdic rec/1A2P_model.dic -ldic lig/1A19_models.dic -com com/1BGS_hex.pdb

Calculate complex list data

This step has already been explained thoroughly in the docking howto.

>>> contacter.py -i complexes.cl -a -ref ../../com/ref.complex

Check complex list data completeness

A simple script that will tell you if all the data has been calculated for all the complexes. If not you can use contacter.py script to recalculate the missing data (see the contacter.py help info).

>>> inspectComplexList.py complexes_cont.cl

Analyze docking result

This is just an example of a simple analysis script that prints various information about the quality of the docking results.

>>> a_rmsd_vs_dock_performance.py -cl complexes_cont.cl -ref ../../com/ref.complex -key fnac_10

and another scripts that visualizes the result as contour plots

>>> a_multidock_contour.py -cl complexes_cont.cl

Personal tools