Biskit 2.1.0b released
A new biskit beta release should soon pop up on the sourceforge download page. It wraps up the current CVS snapshot with major changes at the very core of Biskit.
1. PDBModel overhaul
I have changed the way PDBModel stores its atom information. The new version unifies all atom-centered data in a single ProfileCollection 'atoms' and all residue-centered data in a ProfileCollection 'residues'. For historic reasons, infos stemming from the PDB file were previously kept in a list of dictionaries which was separate from add-on residue and atom profiles. This was (a) inefficient, (b) made it more difficult to combine PDB infos with calculated profiles and (c) meant that powerful ProfileCollection methods could not be used for most of the PDBModel data. The new ProfileCollections are further improved and offer dictionary-like "CrossViews" and iterators that simulate the old list of dictionaries.
Along the way, I have tried to make the handling of PDBModels more intuitive and concise. Profiles and atoms can now be directly accessed from the PDBModel instance:
model['name'] <==> model.atoms['name']The first example will return the profile (list) of atom names from the PDB. The second example creates a new residue profile with the single-letter amino acid names (depending on the length of the input list, either a residue or an atom profile are created).
model['short_residue_name'] = model.sequence() <==> model.residues.set('long_residue_name', model.sequence() )
Please, have a look at the new PDBModel tutorial for more examples!
Last not least, the new PDBModel uses a much more efficient and consistent way to keep track of residue and chain borders. This, together with abandoning the old atom dictionaries and the switch to numpy (see below), has sped up take and compress operations by a factor of several 100!
As usual, the new version is backward-compatible to pickles of previous PDBModels, which are converted on the fly.
2. Migration to Numpy
The pleithora of bugs in every more (or less) recent version of the Numeric library has forced us to migrate to its new numpy incarnation earlier rather than later. This constitutes a major undertaking as there are several syntax changes and the new arrays are not compatible to the old ones. As a consequence, I also had to provide a new biggles version that is based on numpy rather than Numeric. For a less painful experience, I am using the numpy.oldnumeric migration module in almost all cases. New modules should use numpy directly. By and large, the new numpy is an improvement. It is definitely more stable, in many cases even faster, and gives better support for non-numeric values like arrays of strings.
See also 'biskit/docs/numeric2numpy.txt'
3. Conclusion / Issues
All tests I can currently perform (this excludes PVM, homology modeling and some interfaces to external programs) are running through without problems. But since the changes are very widespread and at the center of the package, I expect that there are some bugs still awaiting discovery. Furthermore, I haven't yet converted the scripts folder to the new numpy and, since we are right now lacking a good test suite for this part of Biskit, many of the biskit/scripts will probably be broken (albeit not difficult to fix). For these reasons, this release is labeled as a beta version.
Recommendation:Users who are mostly interested in reproducing one of the existing workflows (docking or homology modeling scripts) should for the moment stick to release 2.0.1. However, if you are looking at Biskit for developing your own programs and scripts, I would recommend to start from the CVS version or 2.1.0. The improvements should outweight potential bugs and you spare yourself a later migration. Please don't hesitate to report bugs and problems!