Reading and handling MD trajectories
- 1. Reading a Trajectory
- 2. Looping over frames
1. Reading a Trajectory
Lets consider the following case. You have a nice PDB file lying around on your hard drive and also a molecular dynamics (MD) trajectory in the mdcrd file format from the Amber MD package. If so, you can read the coordinates of each time frame of this MD trajectory using Biskit in very few steps. The paths given here are not real, there are no sample files distributed with Biskit right now.
For reading a trajectory, you need a PDB and the mdcrd file, lets call the paths to those files pdbPath and mdPath respectively in the code :
>>> import Biskit as B
>>> crd=B.AmberCrdParser(mdPath, pdbPath, box=1)
>>> crd
<Biskit.AmberCrdParser.AmberCrdParser instance at 0x1284950>
Here I specify box=1, as I know that my trajectory holds box information. If you do not use periodic boundary conditions, then you can drop the box argument. Instead of passing pdbPath you can also directly pass an already opened PDBModel to AmberCrdParser.
As you can see, all this operation does, is to create an AmberCrdParser instance. To actually read the coordinates into memory, you have to trigger the following command :
>>> traj=crd.crd2traj()
Now you shall notice two things. The process should need some time, and if you track your memory usage, this should go up quite fast for the python session you are executing. Be careful not to load huge MD trajectories, but rather small bunches of them (previously split up).
That's done, now you read in the trajectory. On this object you can perform several nice tasks. Read the following section to get a first insight of how things can be done.
2. Looping over frames
As you have the trajectory instance now stored in the traj variable, lets check what we can do with that :
>>> dir(traj)
['_Trajectory__cmpFileNames', '_Trajectory__cmpLists', '_Trajectory__collectFrames',
'_Trajectory__compressPca', '_Trajectory__create', '_Trajectory__defaults',
'_Trajectory__resWindow', '_Trajectory__takePca', '__doc__', '__getitem__',
'__getstate__', '__init__', '__len__', '__module__', '__setstate__',
'argsortFrames', 'atomMask', 'avgModel', 'blockFit2ref', 'clone', 'compressAtoms',
'compressFrames', 'concat', 'concatAtoms', 'ex_numbers', 'fit', 'frameNames', 'frames',
'getFluct_global', 'getFluct_local', 'getGammaFluct', 'getPDBModel', 'getPca', 'getRef',
'getResFluct', 'initVersion', 'keepAtoms', 'keepFrames', 'lenAtoms', 'lenFrames',
'pairwiseRmsd', 'pc', 'pcMovie', 'pca', 'plotProfile', 'profile', 'profile2mask',
'profileInfo', 'profiles', 'ref', 'removeAtoms', 'removeFrames', 'replaceContent',
'resIndex', 'resMap', 'residusMaximus', 'setProfile', 'setProfileInfo', 'setRef',
'sortAtoms', 'sortFrames', 'takeAtoms', 'takeChains', 'takeFrames', 'transform',
'verbose', 'version', 'writeCrd', 'writePdb', 'writePdbs']
Well, as you can see, there are lots of methods. One very basic thing one is tempted to do on MD trajectories is to get the coordinates of the system for a given frame. The trajectory object gives you direct access to those coordinates via frames.
>>> traj.frames.shapeframes contains actually a 3d numpy array of coordinates. The shape of this array can be shown using the shape attribute of frames. In the sample trajectory I used here, I have for example 50 frames, 20696 atoms in the system and 3 coordinates for each atom (x,y and z). Thus, one could either treat the coordinates directly in the 3d array, or loop frame by frame like it is shown in the example above.
(50, 20696, 3)
>>> for curFrame in range(len(traj)):
>>> xyzOfFrame=traj.frames[curFrame]
>>> ...
>>> myVeryInterestingFunctionThatDoesSomethingOnTheCoordinates(xyzOfFrame)
This is clearly just to get started with handling trajectories. You can find more information in the tutorials provided with the code in biskit/doc/tutorial_md and the API