Re: [Relax-devel] handling of relaxation data in results files -- January 11, 2006

It might be worth storing these emails on the relax-devel mailing
list.  Hitting reply wil,l by default, put the email address of the
sender rather than the mailing list.  Your last email (below) didn't
make it to the list.

The fix to the problem shouldn't be too difficult.  I'm quite busy
coding the relaxation curve fitting C modules at the moment but, if
you would like, I could fix the problem for you.

If you want to make a large number of changes to rip out the global
structures, I would recommend working with the 1.1 branch as the
changes are so big that it will be very difficult to migrate the
changes from 1.0 to 1.1.  A private branch can be created in the
repository directory 'branches/' which is a copy of 1.1.  The command
you can type would be something like:

svn cp svn+ssh://macraild@xxxxxxxxxxx/svn/relax/1.0
svn+ssh://macraild@xxxxxxxxxxx/svn/relax/branches/macraild

You can then checkout and hack this branch to bits (committing any
changes you make).  If the global structures can be successfully
removed, your private branch can then be merged back into the main 1.1
development branch.  If the hacking goes nowhere, you can remove the
branch.  Don't worry about damaging the repository, every action in
SVN can be reverted.  The private branch also means that your changes
won't effect anyone else who has checked out the head of the main
line.


On 11 Jan 2006 10:36:51 +0000, Chris MacRaild <c.a.macraild@xxxxxxxxxxx> wrote:
> I'm working from the head of the 1.0 branch. I will make the quick fix
> of data structure creation that you suggest - probably as easy to code
> it from scratch as to go searching in the repository for your old fix. I
> might also look a bit more deeply at the possibility of removing the
> global data structures if I have a chance. Like I said, I think this is
> the best prospect for a robust solution to the problem.
>
>
> On Wed, 2006-01-11 at 02:31, Edward d'Auvergne wrote:
> > I think I know the exact problem.  I though I had written and properly
> > debugged the code to handle this situation.  The reason for having a
> > global set of data structures as well as residue specific structures
> > relating to the relaxation data is two fold.  That the global
> > structures keep track of all data which has been loaded into relax and
> > are used for internal accounting.  The residue specific structures
> > relate the actual data to it's field strength and data type (NOE, R1,
> > or R2) independently of which complete data sets were collected.
> >
> > In the results file, I found that recreating the global structures was
> > very difficult when each residue had a different number of data
> > points.  Rare ambiguities could even arise if there are no residues
> > which posses every possible data point.  Therefore I decided that the
> > global structures would be printed in the results file and if any data
> > (and error) points were missing, to replace it with None.  In reading
> > the file back in, relax should first recreate the global structures
> > using the data of the first 'selected' residue and then for each
> > residue when 'None' is encountered, recreate the residue specific data
> > structures appropriately.  I.e. drop the corresponding entries in the
> > 'remap_table', 'relax_data', etc.  The 'num_ri' should then be
> > calculated correctly and the contents of 'self.relax.data' for the run
> > should be identical to that prior to saving the results and
> > terminating the program.
> >
> > Two options are possible in fixing the bug.  The first would be to
> > destroy and abandon the global data structures.  This may, however,
> > have a few unintended consequences that would require a bit of
> > recoding.  The second option would be to fix the data structure
> > recreation - which would probably be less disruptive.
> >
> > I thought I had fixed this exact same problem quite a while back by
> > changing the recreation of the residue specific structures.  The code
> > could have been lost though.  Before I imported relax into the
> > subversion database, I think I accidentally overwrote a large number
> > of fixes I made to relax with an earlier version.  The code is
> > probably hidden somewhere deep inside the relax SVN database, but I
> > don't know where it is or what exactly was lost.  Keeping track of
> > code is quite error prone if a versioning system is not used!
> >
> > Oh, which code base are you working from?  Are you using one of the
> > stable 1.0.x releases, have you checked out the head of the 1.0
> > branch, or are you working with the brand new 1.1 development branch
> > I've created for the large number of changes I'm currently making?
> > I'll probably try to get the fix merged into both 1.0 and 1.1, and
> > then possibly release stable version 1.0.10 as a branch off the 1.0
> > line.  I've stabilised the 1.1 line so that any problems in the
> > relaxation curve fitting C modules I'm writing should not affect the
> > model-free code.
> >
> >
> > On 10 Jan 2006 17:54:57 +0000, Chris MacRaild <c.a.macraild@xxxxxxxxxxx> wrote:
> > > Hi,
> > >
> > > For a while now I have been using relax with a dataset where not all
> > > residues have the same same number of relaxation values (I have data at
> > > 3 fields, but the quality varies significantly, and some of the
> > > relaxation decay fits I am not happy with, so I have excluded the
> > > corresponding data). This has caused no obvious problems in most cases,
> > > except that results file io fails in some interesting ways. I've had a
> > > bit of a poke in the code to try and fix this, but it has raised a few
> > > issues that I thought I should run past you before I hack too hard.
> > >
> > > Of course when I load this data from the original data files, the
> > > relevant residue data structures all behave correctly -
> > > self.relax.data.res[runName][res].num_ri is correct for each residue, as
> > > are the residue specific remap_table, ri_labels, etc.
> > >
> > > When I do a results.write(), things change in the results file: all
> > > residues now have the same remap_table, etc, and missing Ri values and
> > > errors are given the value None. This in itself makes some sense in the
> > > context of the tabular file format, but is inconsistent with the way
> > > things are handled in the data structures above.
> > >
> > > Just in case I'm not being clear, an example. Lets say residue 1 has all
> > > 9 data points, but residue 2 is missing the first data point, so only
> > > has 8. On initially loading the data files, I have:
> > > self.relax.data.res[runName][1].num_ri = 9
> > > self.relax.data.res[runName][1].remap_table = [0,1,2,0,1,2,0,1,2]
> > > self.relax.data.res[runName][1].relax_data = [1.214, 0.896, 0.6817,
> > > 18.64, 20.48, 21.89, 0.769, 0.858, 0.893]
> > >
> > > self.relax.data.res[runName][2].num_ri = 8
> > > self.relax.data.res[runName][2].remap_table = [1,2,0,1,2,0,1,2]
> > > self.relax.data.res[runName][2].relax_data = [0.896, 0.6817, 18.64,
> > > 20.48, 21.89, 0.769, 0.858, 0.893]
> > >
> > > But the results file looks like:
> > > Res  ...    remap_table   ...     relax_data
> > > 1           [0,1,2,0,1,2,0,1,2]   [1.214, 0.896, 0.6817, 18.64, 20.48,
> > > 21.89, 0.769, 0.858, 0.893]
> > > 2           [0,1,2,0,1,2,0,1,2]   [None, 0.896, 0.6817, 18.64, 20.48,
> > > 21.89, 0.769, 0.858, 0.893]
> > >
> > >
> > > This inconsistency comes back to haunt us on results.read(). The first
> > > problem is that Ri values and errors get read in as None. This causes
> > > subsequent minimisation to fail (by some wonder of python's implicit
> > > type conversions, None < 0.0 evaluates as True so minimisation fails
> > > with negative errors!). The second problem relates to the fact that
> > > ri_labels, remap_table, etc are defined twice in the relax data
> > > structures, once at self.relax.data.res[runName][res].ri_labels, and
> > > again at self.relax.data.ri_labels[runName].
> > >
> > > read_columnar_relax_data() tries to set data.ri_labels[runName], etc.
> > > from the first line that is read from the results file. Then we have:
> > >
> > >     # Test if the relaxation data is consistent.
> > >     if self.ri_labels != eval(self.file_line[self.col['ri_labels']]) or
> > > self.remap_table != eval(self.file_line[self.col['remap_table']]) or
> > > self.frq_labels != eval(self.file_line[self.col['frq_labels']]) or
> > > self.frq != eval(self.file_line[self.col['frq']]):
> > >         raise RelaxError, "The relaxation data is not consistent for all
> > > residues."
> > >
> > > This checks that the ri_labels, etc, for each line of the results file
> > > is the same as that from the first line read. This exception is never
> > > raised, because these values are always consistent in the results file,
> > > even if they weren't in the original data, as I've outlined above. It
> > > does, however, seem to be an attempt to stop me using data where not all
> > > residues have the same same number of relaxation values. It might be
> > > that I'm missing something, but I can see no good reason why I should be
> > > stopped, and indeed using this type of data has caused me no other
> > > problems that I can see.
> > >
> > > So, the 'big issues' for your consideration:
> > >  - Is there any good reason for having ri_labels, etc defined at both
> > > the level of the run (at data.ri_labels[runName]) and at the level of
> > > the residue (at data.res[runName][res].ri_labels)? We have seen other
> > > bugs associated with these duplications, and it does seem to be asking
> > > for trouble.
> > >  - A decision needs to be made about which of these parameters are
> > > expected to be constant across all residues in a run, and which are
> > > potentially allowed to vary. Ideally, it seems to me, those which are
> > > constant across the run will be defined only at
> > > data.param_name[runName], and those that might vary from one residue to
> > > another will be at data.res[runName][res].param_name (and only there).
> > > Again, in my opinion, what ever convention is adopted in the internal
> > > data structures of relax should also be reflected in the way the
> > > parameter values are output to the results file.
> > >
> > > Anyway, let me know what you think.  I'm happy to have a hack at
> > > resolving these issues which ever way you decide.
> > >
> > >
> > > Chris
> > >
> > >
> > > _______________________________________________
> > > Relax-devel mailing list
> > > Relax-devel@xxxxxxx
> > > https://mail.gna.org/listinfo/relax-devel
> > >
> >
> > _______________________________________________
> > Relax-devel mailing list
> > Relax-devel@xxxxxxx
> > https://mail.gna.org/listinfo/relax-devel
> >
>
>
Re: [Relax-devel] handling of relaxation data in results files (January 11, 2006 - 10:52)