Structure sampling with Amber
This document described how to setup a parallell molecular dyanamics run using Amber and amber_ensembleMD.py.
- Amber (7 or higher)
- Biskit installation
- 10 computers reachable via ssh without password
- PDB with protein structure
- Check, clean up model, identify disulfide bridges (WhatIf web)
- build amber topology (tleap)
- run Ensemble simulation (amber_ensembleMD.py)
- convert result to Biskit.EnsembleTraj object
- fuzzy-cluster the trajectory
1. Clean up model (manually)
Create project folder:
mkdir my_topology cd my_topology
Delete hydrogens from PDB
Go to the whatif web interface:
http://www.cmbi.kun.nl/gv/servers/WIWWWI/ - HIS protonation Hydrogens / Add protons save hadded.pdb to my_topology -> note how whatif protonated HIS residues (NE, ND, or both) - ASN or GLN flips Hydrogen / optimize h_bond network -> note, which ASN, GLN have to be flipped (end of result page)
Do not use hadded.pdb for subsequent steps! The whatif atom names are incompatible with Amber.
identify S-S bridges (e.g. in VMD)
edit original PDB file:
rename D-protonated HIS to HID
rename E-protonated HIS to HIE
rename dual-protonated HIS to HIP
D-protonated means the H sits on ND; E-protonated (the normal case), it sits on NE; double-protonated means there is a H on both Nitrogen atoms in the ring.
rename CYS involved in S-S bond into CYX
switch OE and NE of ASN and GLN where suggested by WhatIf
remove terminal OT1, OT2 atoms from peptide chains
remove terminal P, O1P, O2P from DNA chains
remove hetero atoms (except you have prep files for them)
remove all hydrogen atoms:
bispy >>> m = PDBModel('mystructure.pdb') >>> m = m.compress( m.maskHeavy() ) >>> m.writePdb( 'in.pdb' )
alternatively you can try to create the correct amber hydrogen names:
bispy >>> m = PDBModel('mystructure.pdb') >>> m.writePdb( 'in.pdb', amber=1 )
Note: the classes used by clean_template.py of biskit/scripts/Mod can probably be adapted to automate most of the above.
2a. Build topology (manually)
>>> cp ../cleaned.pdb ./in.pdb
Select force field
>>> ln -s $AMBERHOME/dat/leap/cmd/leaprc.ff99 ./leaprc
run $AMBERHOME/exe/tleap, in tleap enter:
source leaprc ## optional: ## load definition and parameters for non-standard residues m = loadAmberParams your_own.mod m = loadAmberPrep your_own.prep a = loadPdb in.pdb ## connect SG atoms of S-S bonds ## Note: use e.g. 'desc a.9' to check atom content bond a.9.8 a.38.8 bond a.16.8 a.35.8 ## add solvent ## consider using larger distance buffer and solvateOct ## ## Amber 7 usage: ## solvateBox a WATBOX216 10.0 ## OR Amber 8 usage: solvateBox a TIP3PBOX 10.0 ## add counter ions to neutrality (Na+ or Cl-) addIons a Na+ 0 check a -> "Unit is OK." ## save topology and initial coordinates saveAmberParm a top.parm raw.crd ## quit tleap quit
Create PDB file from amber topology and coordinates
>>> ambpdb -p top.parm < raw.crd > raw.pdb
2b. Build topology (automatted)
>>> amber_pdb2parm.py -i |PDBfile| [-o |out.parm| -fmod |external.mod| >>> -fprep |external.prep| ... ]
This creates out.parm, out.crd, out.pdb, (and leap.log in current folder)
The current version works for Amber 11. Adapt the template in Biskit/data/amber/leap/solvate_box.leap!
3. Prepare folder and run MD
Create result folder
>>> mkdir my_ensemble_MD >>> cd my_ensemble_MD
Fill folder from standard MD template (in biskit cvs)
>>> amber_ensembleMD.py -parm ~/my_topology/top.parm \ >>> -crd ~/my_topology/raw.crd \ >>> -pdb ~/my_topology/raw.pdb \ >>> -nb_nodes 2 >>> -nodes_eq computer1 \ >>> -nodes_prod computer2 computer3 ... \ >>> -n_steps 125000
Note: type amber_ensembleMD.py for help
Run equilibration and wait for it to finish
>>> ssh computer1 ## important! must be on nodes_eq >>> cd my_ensemble_MD >>> ./start_eq.csh
Run simulation (can be on any node)
Note: The complete MD setup is stored in Biskit/data/amber/templates_ensemble_pme. Make a copy, adapt it to your needs and use amber_ensembleMD.py -template |your_copy|.
The template shell script calling sander is in 2_md_xx/2/start.csh* and may need to be adapted to your LAM version.
Note: The current setup works for Amber 11. Amber 7 required slightly different input files, see Biskit/data/amber/template_pme_ensemble_amber7/!
4. Convert amber result to pickled Biskit.EnsembleTraj object
The result folder contains the coordinates of 10 production MDs in 10 folders: e.g. my_ensemble_MD/2????/3????/sim_protein.crd.gz
The following first step should already be done by the MD script, check whether traj_protein.dat exists
>>> cd my_ensemble_MD/2_md_00/3_md_nvt_1ns
Create Biskit.Trajectory w/o water, w/o hydrogens, with amber residue names translated back to standard names
>>> amber2traj.py -i sim_protein.crd \ >>> -o traj_protein.dat \ >>> -r ../../0_dry.pdb \ >>> -b -wat -hyd -rnres
Concat the single member trajectories into one EnsembleTraj object
>>> cd my_ensemble_MD >>> traj2ensemble.py -i 2*/3*/traj_protein.dat \ >>> -o traj_ens.dat \ >>> -pdbCode MD1
Comment: Consider using option -step or -s -e to get a smaller trajectory.
5. Clustering of trajectory (select 10 snapshots)
>>> mkdir my_ensemble_MD/clustered >>> cd clustered
Select 10 snapshots, by CA clustering of every 5th frame
>>> selectModels.py -i ../traj_ens.dat \ >>> -n 10 -a CA \ >>> -step 5 > cluster.log
Comment: This creates n pickled Biskit.PDBModel instances and a pickled dictionary that contains all of them indexed by cluster number. The dictionary is the input to the Biskit docking routines.
If you need PDB files...
>>> model2pdb.py -i *model