This text file describes a docking run starting from a free receptor and a free ligand pdb-file and ends up in a complex list. During the docking the known structure of the bound complex is used for scoring. The use of a bound reference complex is naturally optional.
Two other files that might be worth having a closer look at are the test scripts for regular docking: test_docking (basically the same as this file) and the correspunding file for flexible docking: test_multidock.
- Setup a directory structure for the docking
- Starting structures
- Checking the structures
- Create a reference complex object
- Prepare the structures for docking
- Run docking
- Convert the docking output to a complex list
- Calculate complex list data
- Check complex list data completeness
- Visualize docking result
Setup a directory structure for the docking
Not to end up with a folder cluttered with tons of files a folder structure like the one below this is recommended for a docking project:
~/project_folder/rec ~/project_folder/lig ~/project_folder/com ~/project_folder/dock ~/project_folder/dock/rec ~/project_folder/dock/lig ~/project_folder/dock/com ~/project_folder/dock/hex
You will need a receptor and a ligand structure and optionally a complex structure. So, the first task is to retrieve receptor and ligand coordinate files of the free (unbound) form (currently only pdb format supported).
Example, lets say our starting structures are called:
1lig_original.pdb 1rec_original.pdb 1com_original.pdb (optional)
Place the files in the corresponding rec, lig or com folder.
Checking the structures
As you probably are aware of, pdb-files are quite often a mess -- missing atoms, none-standard amino acids, alternate positions of atoms ... in summary there are a lot of things that protein-protein docking programs and molecular dynamics programs can't handle. To remedy this we have written a script, pdb2xplor.py, that take care of the most common problems. The task of the script is to prepare a pdb-file for XPLOR (which like all MD programs is a quite picky about the structures you feed it). Furthermore, XPLOR is used to add missing atoms (according to the CHARMm19force field).
Changes made "cleaning" the pdb-file include:
- Consecutive numbering of residues
- Missing atoms are added
- Alternate positions of residues are discarded (most populated position is used)
- None standard residues are replaced with closest standard residue
- chain breaks are detected
- All HETATM records are lost (Note 1)
- Tries to remove redundant chains. For example, if the pdb-file contains 4 identical peptide chains the script will assume that there are 4 in the assymetric unit and it will discard 3 of them. (Note 2)
Calling pdb2xplor.py without any options will give you all the available arguments. The arguments we will use here are:
-i the input pdb-file -c .. start chain labeling at position c in the alphabet. (i.e. -c 3 means, chains are labeled D, E, etc.) -cmask .. chain mask for overriding the default sequence identity based cleaning (e.g. 1 0 0 1 0 0 ) -exe .. also execute XPLOR, write log to file
NOTE The name of the input pdb-file has to be:
- at least 5 characters long (the first 4 characters will be used as a name for the cleaned pdb file)
- has to start with a number.
Now run the pdb2xplor.py script for all structures:
>>> cd rec >>> pdb2xplor.py -i 1rec_original.pdb -exe
Comment: The -exe option tells the script to start the XPLOR job before exiting (this assumes that you have setup the path to XPLOR-NIH correctly in your ~/.biskit/settings.dat file). The XPLOR job might take quite some time depending on your hardware. Running the script without the -exe option is much quicker but then you will have to run the minimization manualy before continuing to the next step.
>>> xplor-nih < 1rec_original_generate.inp > 1rec_original_generate.log
And for the ligand:
>>> cd ../lig >>> pdb2xplor.py -i 1lig_original.pdb -c 1 -exe -view
Comment: The option "-c 1" sets the chainId of the ligand to B matching that in the complex structure. The -view option allow you to inspect the cleaned molecule in PyMol.
And finaly the optional reference complex:
>>> cd ../com >>> pdb2xplor.py -i 1com_original.pdb -cmap 1 0 0 1 0 0
Comment: The -cmap 1 0 0 1 0 0 option here assumes that the first and the fourth chain in the pdb-file are the ones that you want to keep.
You have now three cleaned and minimized structures with corresponding psf-files:
rec/1REC.pdb and 1REC.psf lig/1LIG.pdb and 1LIG.psf com/1COM.pdb and 1COM.psf
If you want to keep HETATM records in your pdb-files you will have to edit the generate.inp file manually. I addition you will also need to find or create a parameter and topology files for the CHARMm19 forcefield. A good start for locating parameters is the HIC-Up server (http://xray.bmc.uu.se/hicup/).
You will have to add something like this to your generate.inp file:
topology @@/home/myself/substrate.top end parameter @@/home/myself/substrate.par end segment name="SUBS" chain coordinates @/home/myself/substrate.PDB end end coordinates @/home/myself/substrate.PDB
By default pdb2xplor.py compares the pairwise sequence identity between all chains and uses this info to guess which chains to remove. This behavior can be overridden by giving the script a chain map as an command line option.
Example: You have a pdb-file with 4 chains -- A,B,C and D. Your protein is a homodimer and the dimers are A-B and C-D. Then you should call pdb2xplor.py like this:
>>> pdb2xplor.py -i 1rec_original -cmap 1 1 0 0
Create a reference complex object
Use the script pdb2complex.py to create a complex object of the complex pdb-file that we will need later in this howto. If no output name is given the complex object created will get the default name assigned by the script: ref.complex.
Options used here:
-c complex pdb file or pickled PDBModel object -r receptor chain list (e.g. 0 1 ) -l ligand chain list (e.g. 2 )
Call pdb2complex.py without arguments to see all options.
>>> pdb2complex.py -c 1COM.pdb -r 0 -l 1
Prepare the structures for docking
Before it is time to perform the rigid body docking we need to create a pdb-file that is compatible with the rigid body docking program we are using: HEX. At the same time we create a PCRModel (which is a subclass of the PDBModel that also contains information about the psf-file created in the previous step) and a model dictionary. The latter is not really useful in a simple one-to-one docking but its use will become clear in the flexible docking howto.
>>> cd ../dock/rec >>> PCR2hex.py -psf ../../rec/1REC.psf -pdb ../../rec/1REC.pdb
Comment: Write a Hex compatible pdb file, model and a model dictionary. If you are running a multidocking project (docking more than one ligand against more than one receptor) this script will read multiple pdb files and write the same number of models and Hex compatible pdb files see the flexible docking howto.
The next step is to calculate various additional data that will be stored in the PDBModel.This is done to add various profiles and data to the earliest possible source.
To the original source (so) data that doesn't change with the coordinates is added.
- conservation redidue profile derived (Hmmer.py)
To the input models (-i) that may have changed coordinates (it has so in all cases but one-to-one docking described here) the following is added:
- surface masks (atom profile)
- Fold-X energies (FoldX.py)
- surface curvature profiles, molecular surface (MS) profiles, accessible surface (AS) profiles and relative AS atom profiles (SurfaceRacer.py)
>>> dope.py -s ../../rec/1REC.pdb -so ../../rec/1REC_dry.model -i 1REC.model -dic 1REC_model.dic
The same thing for the ligand and the reference complex:
>>> cd ../lig >>> PCR2hex.py -psf ../../lig/1LIG.psf -pdb ../../lig/1LIG.pdb >>> dope.py -s ../../lig/1LIG.pdb -so ../../lig/1LIG_dry.model -i 1LIG.model -dic 1LIG_model.dic
>>> cd ../com >>> PCR2hex.py -psf ../../com/1COM.psf -pdb ../../com/1COM.pdb >>> dope.py -s ../../com/1COM.pdb -so ../../com/1COM_dry.model -i 1COM.model -dic 1COM_model.dic
Now it is finally time to run the actual rigid body docking:
>>> cd ../hex >>> hexInput.py -r ../rec/1REC_hex.pdb -l ../lig/1LIG_hex.pdb -c ../com/1COM_hex.pdb
Comment: This will create a hex macro file: 1REC-1LIG_hex.mac. If you want change any settings controlling the hex docking you may edit this file manualy (for example turning on post-processing).
In the one-to-one docking you have to start the docking yourself:
>>> hex < 1REC-1LIG_hex.mac > 1REC-1LIG_hex.log
Comment: If you have a dual cpu machine run "hex -ncpu 2 < ..."
Output from Hex:
- 1REC-1LIG_hex.out, a list with the top 512 scoring solutions
- 1REC-1LIG_hex_cluster.out, hex rmsd clustering list
- 1REC-1LIG_hex.log, std out from the program
Convert the docking output to a complex list
Parse the output file from hex (containing the top 512 scoring solutions) and create a complex list object. This connects the docking result with the additional data that we just calculated using dope.py by utilizing the model dictionaries that we also created.
>>> hex2complex.py -rec ../rec/1REC_model.dic -lig ../lig/1LIG_model.dic -hex 1REC-1LIG_hex.out -p
Comment: A simple plot of hex clusters vs. the rmsd to the reference structure and the size of the clusters is displayed using the -p flag.
Calculate complex list data
The following step is quite time consuming and has therefore been parallellized through PVM. The first step is thus to start the pvm deamon (if not already running).
The contacter.py goes through all complexes and add calculate various data. For all the calculations to run trough you need to have Hmmer, Fold-X, Prosa2003 installed and properly configured. The calculations are distributed over many nodes and therefore also require that you have pvm installed and running. The data calculated are:
- fnac (fraction native atom contacts) requires that a reference complex is given
- prosa energy
- foldX binding energy (free-bound difference)
- conservation score
- pairwise contact score (from observed contacts in protein-protein intarfaces)
- xplor noe enregies (restraints file as argument -restr)
The contacted complex list is saved as complexes_cont.cl and any errors are reported in contacting_errors.txt
>>> contacter.py -i complexes.cl -a -ref ../../com/ref.complex
Check complex list data completeness
A simple script that will tell you if all the data has been calculated for all the complexes. If not you can use contacter.py script to recalculate the missing data (see the contacter.py help info).
>>> inspectComplexList.py complexes_cont.cl
Visualize docking result
This is just an example of a simple analysis script that compares different methods of measuring how close the docking solutions are to the native complex. The result of this script are 4 plots that highlights the differences between interface rmsd and our favorite measure fraction of native atom contacts (fnac). There are more scripts in ~biskit/scripts/anaysis/ that can serve as examples to how biskit can be used.
>>> a_compare_rms_vs_fnc.py -i complexes_cont.cl