Hi Eldon, Here are some more fixes to be able to parse the BMRB model-free entries. The problem here is that the correlation time units are not specified. I have guessed that they should be ps. The diff is below. Cheers, Edward [edau@localhost bmr2.1_files_mod2]$ cat diff diff -ur ./bmr17010.str ../bmr2.1_files/bmr17010.str --- ./bmr17010.str 2011-01-28 09:05:58.000000000 +0100 +++ ../bmr2.1_files/bmr17010.str 2011-02-03 14:44:39.000000000 +0100 @@ -1466,8 +1466,9 @@ _Sample_conditions_label $sample_conditions_1 _Mol_system_component_name 'Tryptophan apo-repressor, chain 1' - _Tau_e_value_units . - _Tau_s_value_units . + _Tau_e_value_units ps + _Tau_f_value_units ps + _Tau_s_value_units ps _Text_data_format . _Text_data . diff -ur ./bmr17012.str ../bmr2.1_files/bmr17012.str --- ./bmr17012.str 2011-01-28 09:06:04.000000000 +0100 +++ ../bmr2.1_files/bmr17012.str 2011-02-03 14:45:54.000000000 +0100 @@ -1045,8 +1045,9 @@ _Sample_conditions_label $sample_conditions_1 _Mol_system_component_name 'Tryptophan apo-repressor, chain 1' - _Tau_e_value_units . - _Tau_s_value_units . + _Tau_e_value_units ps + _Tau_f_value_units ps + _Tau_s_value_units ps _Text_data_format . _Text_data . diff -ur ./bmr6470.str ../bmr2.1_files/bmr6470.str --- ./bmr6470.str 2011-01-28 06:06:43.000000000 +0100 +++ ../bmr2.1_files/bmr6470.str 2011-02-03 14:27:52.000000000 +0100 @@ -969,6 +969,7 @@ stop_ _Sample_conditions_label $condition_1 + _Tau_e_value_units ps _Mol_system_component_name Ubiquitin loop_ On 2 February 2011 14:35, Edward d'Auvergne <edward@xxxxxxxxxxxxx> wrote:
Hi Eldon, I don't know if this is the best channel for this information. Is there a BMRB mailing list where it would be better for this information? Ok, this is how I have found these inconsistencies. I have used relax to read in the BMRB NMR-STAR formatted files. This uses bmrblib which I wrote (http://gna.org/projects/bmrblib/). This library is pretty close to complete for relaxation data and model-free data, and would be very easy to extend to handle the entirety of the NMR-STAR dictionary. It can both read and write valid NMR-STAR formatted files in versions 2.1, 3.0, and 3.1 (a little debugging might be still required, and expansion to different revisions such as 2.1.1 is also possible). This Python library is an abstraction of the underlying file format. The very low level reading and writing of the STAR format is handled by Jurgen F. Doreleijers' pystarlib (jurgenfd att gmail dott com, http://code.google.com/p/pystarlib/). For reading the entire BMRB model-free data content, I have performed the following. I have downloaded all of the files from http://www.bmrb.wisc.edu/search/query_grid/query_1_46.html using the link http://www.bmrb.wisc.edu/ftp/pub/bmrb/compress/query_1_46.tar.gz. These are all in the version 2.1 or 2.1.1 format. Then using the file names, I have downloaded all of the corresponding v3.1 files from http://www.bmrb.wisc.edu/ftp/pub/bmrb/entry_lists/nmr-star3.1/. It looks like maybe 30% of the old formatted files have been converted to the newer format so far. I will write 2 subsequent emails with explanations of the problems with the version 2.1 files and the 3.1 files separately. In this mail, I would like to describe general problems. The first is that pystarlib cannot handle the semi-colon notation in non-free looping tag categories, e.g.: loop_ _Vendor.Name _Vendor.Address _Vendor.Electronic_address _Vendor.Entry_ID _Vendor.Software_ID 'J. Patrick Loria' . ; http://xbeams.chem.yale.edu/~loria/ patrick.loria@xxxxxxxx ; 15097 1 This is in the v3.1 file bmr15097.str. The basic pystarlib functionality probably needs to be fixed, assuming this construct is a valid STAR format. The second is that the bmr4970.str entry is not parsable. This file has multiple 15N S2_parameters saveframes: save_S2_parameters_15N_22C save_S2_parameters_15N_35C save_S2_parameters_15N_47C save_S2_parameters_15N_60C save_S2_parameters_15N_73C But these all have: loop_ _Sample_label $sample_one stop_ _Sample_conditions_label $sample_conditions_one They might be the same sample, but the sample conditions are different as the temperature is changing. By eye, this is obvious, but for the automatic parsing of this data, the file has to be blacklisted and skipped. Cheers, Edward