Re: BMRB NMR-STAR v3.1 file format or STAR format reader/writer (maybe using CCPN?). -- July 31, 2008

On Thu, Jul 31, 2008 at 10:04 AM, Edward d'Auvergne <edward@xxxxxxxxxxxxx> wrote:

Hi,

Thank you for answering all my questions. I think that clearly covers
most things for me to start thinking about how this can be implemented
(although I'm still unsure about how to input the bond lengths used in
the calculation of the dipolar constants). For adding BMRB NMR-STAR
v3.1 file format reading and writing capabilities, I've now created a
branch of the relax 1.3 development line which is viewable at
http://svn.gna.org/viewcvs/relax/. I think that it would be
beneficial to add, in addition to the creation of STAR files for BMRB,
reading capabilities simultaneously so that data from the BMRB can
easily be read by relax and then a new or extended analysis performed
(relax can also create input for Modelfree4 and Dasha, as well as
control these programs).

So the major difficulty in implementing this, as I see it, is the
support for generic STAR formatted files or the specific NMR-STAR v3.1
file format. I have done extensive searches and although Python
perfectly supports XML reading and writing, I haven't been able to
find any Python packages for generic STAR format support. Would
anyone know of a STAR or NMR-STAR 3.1 Dictionary reader/writer for
Python? I could write a STAR format parser and writer, but that would
take a lot of time. It would be easier if a Python package for this
could be found or recycled. However the major issue with using a
preexisting package would be legal issues with the copyright
licencing. Ideally the STAR format parser and writer would be
appropriately licenced, for example maybe using
http://www.python.org/download/releases/2.4.2/license/, to allow
incorporation into the standard python modules (sitting alongside the
XML reader/writer) so that all NMR programs with a python interface,
which is quite a few nowadays, could have very easier access to the
BMRB data.

I have found PyCIFRW (http://anbf2.kek.jp/CIF/) and this also includes
PySTARRW which could be useful. However these have lisencing issues
which clash with the open source GPL licence of relax. So
unfortunately I can't use these files. The only other Python STAR
reader/writer I've found is that used by in the CCPN data model
(http://www.ccpn.ac.uk/). This has the ability to convert NMR STAR
format to the CCPN data model format through the file
'ccpnmr/ccpnmr1.0/python/ccpnmr/format/converters/NmrStarFormat.py'.
The copyright licensing should be ok, but unfortunately this is not a
generic reader/writer but something which is tightly integrated into
CCPN. Hence it would be too difficult to incorporate this file into
relax. I would like to have relax interface with the CCPN data model
(https://mail.gna.org/public/relax-devel/2007-11/msg00037.html), but
this would be far into the future and support for a model-free
analysis may not be fully supported by CCPN yet
(https://mail.gna.org/public/relax-devel/2007-12/msg00002.html). One
other thing I noticed at CCPN was a comment that a STAR reader/writer
written in Python by Jurgen Doreleijers
(http://tang.bmrb.wisc.edu/~jurgen/) was incorporated into their
software. Do you know anything about this Python module?

Once a usable STAR reader/writer is accessible by relax, then creating
and reading BMRB deposition files should be relatively straight
forward.

Regards,

Edward

On Tue, Jul 29, 2008 at 7:16 PM, Eldon Ulrich <elu@xxxxxxxxxxxxx> wrote:
> Hi,
>
> Thank you for the quick response and feedback. I will try to answer as many
> of your comments and questions below. We are converting all of our data from
> NMR-STAR v2.1 to NMR-STAR v3.1. Examples of the v3.1 files can be found on
> the BMRB ftp site at
>
> ftp://ftp.bmrb.wisc.edu/pub/data/nmr-star-v3/
>
> These are early beta files and may have serious problems.
>
> For the purposes of this discussion, I will be referring to v3.1 tags.
> Descriptions for these tags can be found at this URL:
>
> http://www.bmrb.wisc.edu/formats.html
>
> Files containing a fake NMR-STAR v3.1 file (nmrstar3_fake.txt) and other
> information on the dictionary in its 'working' form is available from the
> BMRB ftp site:
>
> ftp://ftp.bmrb.wisc.edu/pub/data/nmr-star_dict/dictionary_files
>
> We are very open to suggestions from the community on how to model and
> capture relaxation data and are quite excited about this discussion. I am
> sure I have not addressed all of your questions, but I hope this is a start.
>
> Cheers,
> Eldon
>
>
> Edward d'Auvergne wrote:
>>
>> Hi,
>>
>> I've had a look at the fields and have a few questions as to how these
>> should be implemented. I'm assuming that these are the fields for
>> simply depositing R1 relaxation data into the BMRB, is this correct?
>
> The Excel file contains the tags for the fields in the ADIT-NMR deposition
> system that are mandatory. These fields represent for the most part the meta
> information about the molecule, sample, sample conditions, spectrometers,
> etc. The T1 fields were included as an example for one kind of relaxation
> data and the mandatory fields that would need to be entered in ADIT-NMR. The
> actual tables of data would be uploaded at the time of deposition.
>>
>> So the first question I have has to do with Rx versus Tx. Almost all
>> theories for the interpretation of the T1 relaxation times are
>> dependent upon this being in the R1 rate form (with units of
>> rad.s^-1). relax (http://nmr-relax.com), Art Palmer's curvefit
>> (http://cpmcnet.columbia.edu/dept/gsas/biochem/labs/palmer/software.html),
>> David Fushman's RELAXFIT
>> (http://gandalf.umd.edu/FushmanLab/pdsw.html), and almost all other
>> programs calculate the Rx relaxation rate errors and not relaxation
>> time errors via Monte Carlo simulation. Then the programs relax
>> (http://nmr-relax.com), modelfree4
>> (http://cpmcnet.columbia.edu/dept/gsas/biochem/labs/palmer/software.html),
>> dasha (http://www.nmr.ru/dasha.html), DYNAMICS
>> (http://gandalf.umd.edu/FushmanLab/pdsw.html), Tensor2
>> (http://www.ibs.fr/ext/labos/LRMN/softs/welcome.htm), etc. all work
>> with the rates and not the times. So the storage of relation times
>> and their errors may not be very useful. Is it possible to deposit
>> rates and their errors rather than the antiquated relaxation times and
>> their errors?
>
> Yes, you can deposit rates and the appropriate error and not the times. The
> T1.Val and T1.Val_err tags can have units of appropriate for either times or
> rates (i.e., s or s-1). In the header to the table of T1 values is a tag
> _Heteronucl_T1_list.T1_val_units. The value to this tag defines whether the
> T1 data have been expressed as times or rates.
>
> The terminology used for relaxation studies in NMR has been quite diverse.
> At the time these tags were constructed, the term 'T1' still seemed to be
> the most commonly used. But, we realized capturing the data as rates was
> extremely important and so we allowed for the units for the values to
> actually determine if the values were times or rates.
>
>> Also, conversion of the Rx relation rate errors to the
>> Tx time errors would require full Monte Carlo simulation to be
>> accurate, and I'm not sure if anyone would have done this properly. I
>> could be wrong (anyone on this list who knows otherwise, please
>> correct me), but I don't think there are any programs that use the Tx
>> times or that properly convert Rx errors to Tx errors and vice versa.
>>
>> The second question I have has to do with the integration of relax
>> with the BRMB deposition and automating the process. Can all data for
>> a model-free analysis be deposited at once? For example if relax was
>> to create a STAR formatted file with the ADIT-NMR fields with the R1,
>> R2, and NOE values and errors at multiple fields, with the S2, S2f,
>> S2s, te, ts, tf, and Rex parameters and errors, the selected model
>> information (model name or parameters of the model), parameters such
>> as the CSA value used and bond length, and global parameters such as
>> the diffusion tensor, could this file be accepted? Or will this
>> require multiple small files for multiple deposition?
>>
> All of the data can be uploaded as one file. The NMR-STAR format is modular
> and a single file can contain as many modules (saveframes) of the same or
> different type with a few exceptions. A module or saveframe begins with the
> key term 'save_somestring' and ends with the key term 'save_'. A file can
> contain as many R1, R2, and NOE modules as needed. Within each of the
> modules there is a header tag that takes as a value the field strength of
> the spectrometer used to collect the data in that module as well as the NMR
> experiment. It is important that the experiment used for the data be defined
> uniquely.
>
> The following list of tags contains most of the values you mention, S2, S2f,
> S2s, te, ts, Rex all with errors, and type of model fit. It is missing the
> tf, but this can be easily added. The units for te and ts are provided in
> the header tags
> _Order_parameter_list.Tau_e_val_units and
> _Order_parameter_list.Tau_s_val_units. For the order parameter data, it is
> important to include the experiments used to collect the underlying data. In
> this way the order parameters are linked to the R1, R2, etc data used in
> doing the fitting. It is possible to include in the file a description of
> the software used and the 'method' or parameter file.
>
>
> _Order_param.Order_param_val
> _Order_param.Order_param_val_fit_err
> _Order_param.Tau_e_val
> _Order_param.Tau_e_val_fit_err
> _Order_param.Rex_val
> _Order_param.Rex_val_fit_err
> _Order_param.Model_free_sum_squared_errs
> _Order_param.Model_fit
> _Order_param.Sf2_val
> _Order_param.Sf2_val_fit_err
> _Order_param.Ss2_val
> _Order_param.Ss2_val_fit_err
> _Order_param.Tau_s_val
> _Order_param.Tau_s_val_fit_err
> _Order_param.SH2_val
> _Order_param.SH2_val_fit_err
> _Order_param.SN2_val
> _Order_param.SN2_val_fit_err
>
> The CSA data would be included in a separate module, but the same file.
>
>
>> I've also noticed from some of the deposited data (e.g.
>>
>> http://www.bmrb.wisc.edu/data_library/gen_saveframe.php?accNum=6470&saveframe=T1_relaxation
>> ) that all the data is identified by residue number. For supporting
>> analyses using nucleic acids, small biomolecules, or proteins where
>> more than just the backbone NH relaxation has been studied, would it
>> be possible to additionally have an atom or spin numerical code and
>> textual label? If an analysis is done on a molecular complex, is the
>> deposition of data for multiple molecules supported as well?
>
> The header tag of the type '_Heteronucl_T1_list.T1_coherence_type' is
> intended to provide an idea of the coherence being measured. In addition,
> the following set of tags or similar set for other kinds of data are
> provided for every row in a data value table. The values for these tags
> allow an atom within a molecular assembly of almost any complexity
> (including ones that are undergoing chemical or conformational exchange) to
> be defined.
>
> _T1.Entity_assembly_ID
> _T1.Entity_ID
> _T1.Comp_index_ID
> _T1.Seq_ID
> _T1.Comp_ID
> _T1.Atom_ID
> _T1.Atom_type
> _T1.Atom_isotope_number
>
> The data that is available from BMRB has been supplied by authors for the
> most part and the quality and how well the data are described is variable
> and in all cases out of our control as authors do not respond to our
> requests for better descriptions and more complete data sets.
>
>>
>> I still have many questions about the fields, their format in the STAR
>> file to deposit, which are compulsory, and which fields do not yet
>> exist for deposition of all model-free data (much of this data can be
>> seen in the relax results file
>>
>> http://svn.gna.org/viewcvs/relax/1.3/test_suite/shared_data/model_free/OMP/final_results_trunc_1.3.bz2
>> ). For example most of the STAR tags in
>>
>> http://www.bmrb.wisc.edu/data_library/gen_saveframe.php?accNum=5841&saveframe=S2_parameters
>> are not in the excel spreadsheet. And why are order parameters and
>> their errors input using the STAR format tags '_S2_value' and
>> '_S2_error' whereas the T1 fields are called '_T1_value' and
>> '_T1_value_error' and the effective model-free internal correlation
>> time te filed under '_Tau_e_value' and '_Tau_e_value_fit_error'?
>
> When working on an almost 5000 tag dictionary over many years,
> inconsistencies creep into the tag names. We have tried to eliminate these
> inconsistencies as much as possible in the NMR-STAR v3 dictionary, but I
> would guess there are still at least a few.
>
>> Would you have an example deposition text file formatted correctly
>> using the ADIT-NMR tags in the Excel file? Or is this unmodified, for
>> example is
>> http://www.bmrb.wisc.edu/cgi-bin/explore.cgi?format=raw&bmrbId=5841
>> the same file as that that the authors deposited?
>
> I do not have a full relaxation example file. For example files you should
> look in the directory on the ftp site listed above. We are working to clean
> up these files as quickly as possible.
>
>> And how is the
>> field strength dependent data handled, e.g. in
>> http://www.bmrb.wisc.edu/cgi-bin/explore.cgi?format=raw&bmrbId=4970
>> there are 2 spectrometers declared to be a 600 and 750, yet there is
>> relaxation data at 500, 600 and 750 present in the file?
>
> As mentioned above, for each module containing data that are field strength
> dependent there should be a tag that takes as a value the field strength of
> the spectrometer used to collect the data. For data like order parameters
> that are derived from different sets of data, currently the experiment list
> is used to trace back to the input data and spectrometer field strength.
>
>>
>> Cheers,
>>
>> Edward
>>
>>
>> P.S. For reference, this message will soon appear at
>> https://mail.gna.org/public/relax-devel/.
>>
>>
>>
>> On Mon, Jul 28, 2008 at 6:01 PM, Eldon Ulrich <elu@xxxxxxxxxxxxx> wrote:
>>>
>>> Hi Edward,
>>>
>>> Sorry for the delay in providing a list of the required ADIT-NMR fields.
>>> An
>>> Excel file with the information is attached compiled by one of our
>>> students.
>>> The table provides a fairly complete description of the field and where
>>> appropriate the dependencies on other fields. In terms of the
>>> experimental
>>> data, only the fields required for T1 relaxation data were included. The
>>> required fields may vary slightly depending on the kinds of data being
>>> deposited.
>>>
>>> I hope this information helps. If you have any questions or need
>>> additional
>>> information, please let me know.
>>>
>>> All the best,
>>> Eldon
>>>
>>> _______________________________________________
>>> relax (http://nmr-relax.com)
>>>
>>> This is the relax-devel mailing list
>>> relax-devel@xxxxxxx
>>>
>>> To unsubscribe from this list, get a password
>>> reminder, or change your subscription options,
>>> visit the list information page at
>>> https://mail.gna.org/listinfo/relax-devel
>>>
>>>
>
>

_______________________________________________
relax (http://nmr-relax.com)

This is the relax-devel mailing list
relax-devel@xxxxxxx

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-devel

Hi Ed

some alternatives

1. stardom (gpl; ignore what it says on the first web page and just look at the license) converts start files to an xml format http://www.pasteur.fr/recherche/unites/Binfs/stardom
2. ccpn format converters come in two parts (I have helped write one for import of data from xplor-marvin) I would have a look at ccpnmr1.0/python/ccp/format/nmrStar which is a basic star file reader framework...
3. I can assist here (my structure calculation stuff is now done [mostly so I am heading back to dynamics]) ;-)

regards
gary

Re: BMRB NMR-STAR v3.1 file format or STAR format reader/writer (maybe using CCPN?).

Header

Content

Related Messages