Package bmrblib :: Package pystarlib
[hide private]
[frames] | no frames]

Source Code for Package bmrblib.pystarlib

 1  __author__    = "$Author: jurgenfd $" 
 2  ___revision__ = "$Revision: 9 $" 
 3  ___date__     = "$Date: 2007-01-11 20:40:26 +0100 (Thu, 11 Jan 2007) $" 
 4   
 5  """ 
 6  Goal of these routines are to provide a Python interface to writing, reading, 
 7  analyzing, and modifying NMR-STAR and mmCIF files and objects. 
 8   
 9  NOTES: 
10  * Not supported STAR features (not used in NMR-STAR and mmCIF files): 
11      - Nested loops 
12      - Global block 
13  * Limitations to content: 
14      - STAR file should have one and only one data_ tag and that should 
15          be the first thing in the file 
16      - Comments on input are ignored. 
17  * Limitations to the lay out (for fast parsing). 
18      - Save frames should start and end with save_ at the beginning of 
19          the line 
20      - Perhaps some unknown;-( 
21   
22  SPEED ISSUES: 
23  * There was a good Python API written by Jens Linge and Lutz Ehrlig (EMBL). 
24      It can handle much more STAR features and variations to content 
25      and lay out. The current API was written to handle NMR-STAR files in 
26      the order of several Mb for which the EMBL API demanded a lot of 
27      resources. Parsing a 1 Mb STAR file with a huge table of mostly numeric 
28      values required a peak 50 Mb in memory and about 2 hours with StarFormat. 
29      My guess was that this could be much faster if at least the lowest level 
30      of the dataNode value (where it is a string or number) would use native 
31      Python objects in stead. 
32      Another issue is that a large text object when parsed by the 
33      EMBL API got copied over and over resulting in loss of speed and a 
34      significant increase in memory use. 
35  * This API uses native Python objects for a list of tags (looped or free) 
36      with user defined objects above that where speed and memory are less of an 
37      issue. It parses a 10 Mb STAR file in 25 seconds with a peak memory 
38      usage of 45 Mb. The average value in the file is 3 chars long. A Python 
39      string object has a reference count (4), type pointer (4), malloc overhead 
40      (4), trailing \0 (1) and the content (rounded up to multiples of 4). 
41      Ignoring the content rounding we go from 3 bytes to 20 bytes (factor 7) 
42      in total for the average string in the example file. Considering some 
43      overhead for the objects on top of the string objects the 55 Mb doesn't 
44      look that bad. 
45  * Compare this with the C STARLIB2 from Steve Mading (BMRB) which takes 12 
46      cpu seconds and 18 Mb peak memory usage. For STARLIBJ (Java) Steve 
47      got 40 Mb peak memory usage and 57 seconds. Memory usage is slightly 
48      better but speed is a factor 2 slower. This was using the best Java 
49      engine we had. Another one we tested was a factor 3 slower. 
50  * Added yet another STAR parser in Java project: Wattos.Star.STARParser 
51      Optimized to be fast and efficient with memory. 
52  * Summary: 
53   
54  Test on Windows using a single Pentium IV CPU 2 GHz 
55  Language STAR file size (Mb) Time (s)  RAM (Mb) Notes 
56  ############################################################################### 
57  C        10                   7.2      18       Using Steve's STARlib2. 
58  Java     10                  57        40       Tested by Steve  
59  JavaNEW  10                   5.2     100       New parser based on SANSj: Wattos.Star.STARParser 
60  Python   10                  25        45       Written at BMRB 
61  Python*  1*                  7200*     50*      Written at EMBL 
62  ############################################################################### 
63  Labeled with asterisk because the size of test file had to be truncated and was 
64  run on older machine. Their API was developed for small files (< 100 kb). 
65   
66  * References: 
67  S. R. Hall and A. P. F. Cook. STAR dictionary definition language: initial specification. 
68      J.Chem.Inf.Comput.Sci. 35:819-825, 1995. 
69  S. R. Hall and N. Spadacinni. The STAR file: detailed specifications. 
70      J.Chem.Inf.Comput.Sci. 34:505-508, 1994. 
71  J. P. Linge, M. Nilges, and Ehrlich L. StarDOM: from STAR format to XML. 
72      J Biomol NMR, 1999. 
73  N. Spadacinni and C. B. Hall. Star_base: accessing STAR file data. 
74      J.Chem.Inf.Comput.Sci. 34:509-516, 1994. 
75  J. Westbrook and P. E. Bourne. STAR/mmCIF: An ontologoy for macromolecular structure. 
76      Bioinformatics. 16 (2):159-168, 2000. 
77  """ 
78   
79  ## Public attributes 
80  verbosity               = 2 
81