Structure sampling with Amber

Structure sampling with Amber and amber_ensembleMD.py

This document described how to setup a parallell molecular dyanamics run using Amber and amber_ensembleMD.py.

Requirements
Overview
Detailed instructions

Requirements

Amber (7 or higher)
Biskit installation
10 computers reachable via ssh without password
PDB with protein structure

Overview

Check, clean up model, identify disulfide bridges (WhatIf web)
build amber topology (tleap)
run Ensemble simulation (amber_ensembleMD.py)
convert result to Biskit.EnsembleTraj object
fuzzy-cluster the trajectory

Detailed instructions

1. Clean up model (manually)

Create project folder:

mkdir my_topology
cd my_topology

Delete hydrogens from PDB

Go to the whatif web interface:

http://www.cmbi.kun.nl/gv/servers/WIWWWI/
    - HIS protonation
            Hydrogens / Add protons
            save hadded.pdb to my_topology
    -> note how whatif protonated HIS residues (NE, ND, or both)
    - ASN or GLN flips
            Hydrogen / optimize h_bond network
    -> note, which ASN, GLN have to be flipped (end of result page)

Note:

Do not use hadded.pdb for subsequent steps! The whatif atom names are incompatible with Amber.

identify S-S bridges (e.g. in VMD)
edit original PDB file:
- rename D-protonated HIS to HID
- rename E-protonated HIS to HIE
- rename dual-protonated HIS to HIP
  
  D-protonated means the H sits on ND; E-protonated (the normal case), it sits on NE; double-protonated means there is a H on both Nitrogen atoms in the ring.
- rename CYS involved in S-S bond into CYX
- switch OE and NE of ASN and GLN where suggested by WhatIf
- remove terminal OT1, OT2 atoms from peptide chains
- remove terminal P, O1P, O2P from DNA chains
- remove hetero atoms (except you have prep files for them)
- remove all hydrogen atoms:
```
bispy
>>> m = PDBModel('mystructure.pdb')
>>> m = m.compress( m.maskHeavy() )
>>> m.writePdb( 'in.pdb' )
```
- alternatively you can try to create the correct amber hydrogen names:
```
bispy
>>> m = PDBModel('mystructure.pdb')
>>> m.writePdb( 'in.pdb', amber=1 )
```

Note: the classes used by clean_template.py of biskit/scripts/Mod can probably be adapted to automate most of the above.

2a. Build topology (manually)

>>> cp ../cleaned.pdb ./in.pdb

Select force field

>>> ln -s $AMBERHOME/dat/leap/cmd/leaprc.ff99 ./leaprc

run $AMBERHOME/exe/tleap, in tleap enter:

source leaprc

  ## optional:
  ## load definition and parameters for non-standard residues
m = loadAmberParams your_own.mod
m = loadAmberPrep your_own.prep

a = loadPdb in.pdb

  ## connect SG atoms of S-S bonds
  ## Note: use e.g. 'desc a.9' to check atom content
bond a.9.8  a.38.8
bond a.16.8 a.35.8

  ## add solvent
  ## consider using larger distance buffer and solvateOct
  ##
  ## Amber 7 usage:
  ## solvateBox a WATBOX216 10.0
  ## OR Amber 8 usage:
solvateBox a TIP3PBOX 10.0

  ## add counter ions to neutrality (Na+ or Cl-)
addIons a Na+ 0

check a
        -> "Unit is OK."
  ## save topology and initial coordinates
saveAmberParm a top.parm raw.crd

  ## quit tleap
quit

Create PDB file from amber topology and coordinates

>>> ambpdb -p top.parm < raw.crd > raw.pdb

2b. Build topology (automatted)

>>> amber_pdb2parm.py -i |PDBfile| [-o |out.parm| -fmod |external.mod|
>>>                   -fprep |external.prep| ... ]

This creates out.parm, out.crd, out.pdb, (and leap.log in current folder)

The current version works for Amber 11. Adapt the template in Biskit/data/amber/leap/solvate_box.leap!

3. Prepare folder and run MD

Create result folder

>>> mkdir my_ensemble_MD
>>> cd my_ensemble_MD

Fill folder from standard MD template (in biskit cvs)

>>> amber_ensembleMD.py -parm ~/my_topology/top.parm \
>>>                     -crd ~/my_topology/raw.crd \
>>>                     -pdb ~/my_topology/raw.pdb \
>>>                     -nb_nodes 2
>>>                     -nodes_eq computer1 \
>>>                     -nodes_prod computer2 computer3 ... \
>>>                     -n_steps 125000

Note: type amber_ensembleMD.py for help

Run equilibration and wait for it to finish

>>> ssh computer1  ## important! must be on nodes_eq
>>> cd my_ensemble_MD
>>> ./start_eq.csh

Run simulation (can be on any node)

>>> ./start_prod.csh

Note: The complete MD setup is stored in Biskit/data/amber/templates_ensemble_pme. Make a copy, adapt it to your needs and use amber_ensembleMD.py -template |your_copy|.

The template shell script calling sander is in 2_md_xx/2/start.csh* and may need to be adapted to your LAM version.

Note: The current setup works for Amber 11. Amber 7 required slightly different input files, see Biskit/data/amber/template_pme_ensemble_amber7/!

4. Convert amber result to pickled Biskit.EnsembleTraj object

The result folder contains the coordinates of 10 production MDs in 10 folders: e.g. my_ensemble_MD/2????/3????/sim_protein.crd.gz

The following first step should already be done by the MD script, check whether traj_protein.dat exists

>>> cd my_ensemble_MD/2_md_00/3_md_nvt_1ns

Create Biskit.Trajectory w/o water, w/o hydrogens, with amber residue names translated back to standard names

>>> amber2traj.py -i sim_protein.crd \
>>>               -o traj_protein.dat \
>>>               -r ../../0_dry.pdb \
>>>               -b -wat -hyd -rnres

Concat the single member trajectories into one EnsembleTraj object

>>> cd my_ensemble_MD
>>> traj2ensemble.py -i 2*/3*/traj_protein.dat \
>>>                  -o traj_ens.dat  \
>>>                  -pdbCode MD1

Comment: Consider using option -step or -s -e to get a smaller trajectory.

5. Clustering of trajectory (select 10 snapshots)

>>> mkdir my_ensemble_MD/clustered
>>> cd clustered

Select 10 snapshots, by CA clustering of every 5th frame

>>> selectModels.py -i ../traj_ens.dat \
>>>                 -n 10 -a CA \
>>>                 -step 5 > cluster.log

Comment: This creates n pickled Biskit.PDBModel instances and a pickled dictionary that contains all of them indexed by cluster number. The dictionary is the input to the Biskit docking routines.

If you need PDB files...

>>> model2pdb.py -i *model

Personal tools