mailRe: BMRB heteronuclear relaxation records together with multiple spin types.


Others Months | Index by Date | Thread Index
>>   [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Header


Content

Posted by Edward d'Auvergne on February 27, 2009 - 11:47:
On Fri, Feb 27, 2009 at 7:39 AM, Eldon Ulrich <elu@xxxxxxxxxxxxx> wrote:
Hi,

I have tried to provide more information below. I am reworking the cross
relaxation portions of the dictionary and will let you know when it is
available. Should not be but a few days.

No need to rush.  I need to develop the API more, as the NMR-STAR
reading abilities are only in my head at the moment.  And generating
the entity and and molecule assembly, and adding all other structural
info might take a while.


I have to apologise for the very basic questions - my understanding of
the NMR-STAR dictionary is very basic at the moment and I'm trying to
learn it as fast as possible!  And converting all data types from
relax into NMR-STAR v3.1 format and then loading it back again into
relax requires a very deep understanding.  My design is a generic API
which interfaces relax to the pystarlib library, the former containing
the data you would like and the later writing and reading basic STAR
formatted files.  This API will be a package - separable from relax,
so maybe interesting for you for other purposes - which implements
reading and writing of the NMR-STAR dictionary v3.1.  It is designed
to handle different versions of the NMR-STAR dictionary, and the
incomplete code is currently located in the 'bmrblib' directory in the
bmrb relax branch (http://svn.gna.org/viewcvs/relax/).  What I would
like for this API is that the calling program dumps the data to it in
arrays, floats, etc., and the library creates a partial v3.1 saveframe
(including required tags, tags necessary for smooth 2 way data flow,
and any other useful, non-internal tags).  A complementary set of
functions returns the data from a NMR-STAR file.  There may be a lot
of overlap with CCPN here, but as the CCPN data model is not so good
with dynamics data (yet!), I think this is the best way forward at
this point.  More below...

The work you are doing should be very useful and when complete we would be
pleased if you would allow us to make the API package available from the
BMRB web site.

I'm thinking of doing what I did with another relax module.  Back when
I was working on my PhD, I wrote many different optimisation
algorithms to learn about optimisation and to find the best technique
for model-free analysis (which ended up being the most complex to code
as it requires Hessians, http://en.wikipedia.org/wiki/Hessian_matrix).
 This open source, GPL licenced code is now it's own project called
minfx located at https://gna.org/projects/minfx/.  So I was thinking
of spinning off this NMR-STAR dictionary API into its own open source
project.  The reason is because I will only implement what is
necessary for dynamics (relaxation and model-free to start with, other
analyses later), but with the framework it will be easy to add other
saveframes and its tags later.  Then other NMR spectroscopists can
contribute the code they write for their needs back into the project
for the benefit of all.  But as open source, you're more than welcome
to spread the code.

This may compete with the NMR-STAR part of CCPN a little, but as the
data model part of that is also open source, then the cleanest and
easiest API will probably win out in the end, and both relax and CCPN
can use the winner.  I'm just developing this as a separate library
for now because of the reasons above.  And even if relax dumps this
library in the end and switches to CCPN (when it can handle dynamics
better), someone else might find it useful.  And it can forever be
spread through http://www.bmrb.wisc.edu/.


relax is now producing NMR-STAR dictionary files containing relaxation
data (one truncated example is attached), although these are obviously
far from complete.  I have a few questions as this does not work for
all molecule types.  For example if you have a protein or RNA system
and have collected both 15N and 13C R1 relaxation data.  Should these
go into 2 separate heteronucl_T1_relaxation saveframes?

In the past 15N and 13C relaxation data would be put in separate
saveframes,
but if the new 'coherence_type' values are used all data of the same
coherence type could be included in one table. The atom names are
required.

Which would you prefer I implement right now if people in a few months
time would like, or are asked, to submit dynamics data?  For BMRB
submission, I would assume that relax's NMR-STAR file generation need
not be 100% complete?  Oh, is this new 'coherence_type' part of v3.1
or will it be a future version?

I would prefer that you implement the new version of the coherence type
values. This seems to be a better way to go. Do you agree?

If the ADIT-NMR system will accept it, I'll code it.


And is there more information

on how to construct the molecular entities other than at
http://www.bmrb.wisc.edu/deposit/mol_assembly/?

The molecular assembly is made up of the polymer chains and ligands and
cofactors that are tightly bound (hemes, iron sulfur clusters, etc.) or
exchanging between bound and free states. In general, things like buffers,
solvents, and lipids are not considered part of the molecular assembly. Each
component in the assembly is given an ID (_Entity_assembly.ID). Both a
homodimer and a heterodimer assembly would have two _Entity_assembly.ID
values, one for each polymer chain in the assembly. For each unique polymer
or ligand in the assembly, an entity saveframe is required (one entity for a
homodimer assembly, two entities for a heterodimer assembly, for example).
Entities can be either polymers or non-polymers. For polymer entities, the
sequence is given in both a one-letter and three-letter format.

If someone does a dynamics analysis using a PDB structure that has
cofactors, metals, etc. as part of the structure, but is completely
unused in the model-free analysis, do I still need to add all this
stuff into the NMR-STAR file?  I'm sure that the relax PDB reader (or
the Scientific Python reader either) cannot handle all of the rubbish
in the PDB.  What if someone uses the structure of a single protein
taken from a massive molecule complex like the ribosome?  Does the
entity need to be 100% complete, or can only residues used in the
analysis be included?  Is there the option of specifying which PDB
file was used in the analysis, as this would be very useful for 2 way
data flow between BMRB and relax (or any other dynamics programs)?

Oh, and are the 1 letter codes required, or are these automatically
generated by ADIT-NMR?  It will be easy to add a translation function
for this into relax, like fixing the insanity of MOLMOL's 'LYS+',
etc., but if not necessary, then I won't bother.  What about RNA?



More information on required tags and their descriptions is available
here:

http://www.bmrb.wisc.edu/dictionary/3.1html_frame/frame_index.html

I'm using exactly this documentation now to implement this.  I have
noticed a problem though.  The file:

http://www.bmrb.wisc.edu/dictionary/3.1html/TablePage.html

You may not be using the 'frames' version of the dictionary. This version
may provide more detailed information for each tag.

I've used both, just because without frames you can get a better
overview.  Plus when implementing and API to read and write these, you
don't need to jump around to quickly.


does not always fully load when you ask for it.  Is this being
automatically generated?

We will need to look into this problem. It seems to work for me here.

It's very intermittent.  There seems to be no problem at the moment
for me.  But when it first happened, I just assumed that the NMR-STAR
v3.1 documentation was incomplete and started looking elsewhere on the
site ;)


I can try to put together more examples and send them to you this
weekend.

Also, for submission, should relax produce the full set of records for
this data, so for example all of:

The required tags are annotated below.


In the header part of each saveframe there is a tag something like
"_Heteronuclear_T1_list.ID". This tag is required and its values are a
counter for each instance of a saveframe of that category. Your example file
that you attached to this e-mail loaded into the ADIT-NMR deposition system,
but only one saveframe was created for each category.  I think this is
because the '_xxx.ID' tag was missing from the saveframe header.

Ah, yes, I forgot about adding this one for the v3.1 layer.  Easily
fixed, cheers.  I'm sure there's plenty of other tags still missing.


  loop_
      _T1.ID          should be provided as a counter
      _T1.Assembly_atom_ID    not needed
      _T1.Entity_assembly_ID  not abosolutely required always has a
value
of 1
      _T1.Entity_ID           required particularly for molecular
assemblies that have more than one entity (heterodimers, complexes,
etc.)
      _T1.Comp_index_ID       required, this is the residue sequence
number
      _T1.Seq_ID              not needed
      _T1.Comp_ID             required, residue type
      _T1.Atom_ID             required, this is the atom name and allows
for things like backbone nitrogen and sidechain nitrogen data to be
listed
in the same table.
      _T1.Atom_type           not absolutely required but useful to
distinguish CA from calcium.
      _T1.Atom_isotope_number not required, but useful to distinguish 1H
from 2H relaxation
      _T1.Val                 required
      _T1.Val_err                     required in general
      _T1.Resonance_ID                not needed
      _T1.Auth_entity_assembly_ID     not needed
      _T1.Auth_seq_ID         not needed
      _T1.Auth_comp_ID                not needed
      _T1.Auth_atom_ID                not needed
      _T1.Entry_ID                    not needed
      _T1.Heteronucl_T1_list_ID       required, particularly for entries
with more than one T1/R1 table

This will be very useful.  Is this in the individual tag descriptions?
 I cannot find this information in the online documentation.

This information, in a more formal presentation, is available in the
'frames' version of the NMR-STAR v3 dictionary on the web site:

http://www.bmrb.wisc.edu/dictionary/3.1html_frame/frame_index.html

If you select an individual tag, the frame on the right contains a field
"Nulls allowed in database?". The value will be 'no' or 'yes'.

Ah, that is what I should look for.  Cheers.  This is also reachable
through the links on
http://www.bmrb.wisc.edu/dictionary/3.1html/TablePage.html.

Regards,

Edward



Related Messages


Powered by MHonArc, Updated Fri Feb 27 12:00:32 2009