Re: [Relax-devel] handling of relaxation data in results files -- January 11, 2006

I think I know the exact problem.  I though I had written and properly
debugged the code to handle this situation.  The reason for having a
global set of data structures as well as residue specific structures
relating to the relaxation data is two fold.  That the global
structures keep track of all data which has been loaded into relax and
are used for internal accounting.  The residue specific structures
relate the actual data to it's field strength and data type (NOE, R1,
or R2) independently of which complete data sets were collected.

In the results file, I found that recreating the global structures was
very difficult when each residue had a different number of data
points.  Rare ambiguities could even arise if there are no residues
which posses every possible data point.  Therefore I decided that the
global structures would be printed in the results file and if any data
(and error) points were missing, to replace it with None.  In reading
the file back in, relax should first recreate the global structures
using the data of the first 'selected' residue and then for each
residue when 'None' is encountered, recreate the residue specific data
structures appropriately.  I.e. drop the corresponding entries in the
'remap_table', 'relax_data', etc.  The 'num_ri' should then be
calculated correctly and the contents of 'self.relax.data' for the run
should be identical to that prior to saving the results and
terminating the program.

Two options are possible in fixing the bug.  The first would be to
destroy and abandon the global data structures.  This may, however,
have a few unintended consequences that would require a bit of
recoding.  The second option would be to fix the data structure
recreation - which would probably be less disruptive.

I thought I had fixed this exact same problem quite a while back by
changing the recreation of the residue specific structures.  The code
could have been lost though.  Before I imported relax into the
subversion database, I think I accidentally overwrote a large number
of fixes I made to relax with an earlier version.  The code is
probably hidden somewhere deep inside the relax SVN database, but I
don't know where it is or what exactly was lost.  Keeping track of
code is quite error prone if a versioning system is not used!

Oh, which code base are you working from?  Are you using one of the
stable 1.0.x releases, have you checked out the head of the 1.0
branch, or are you working with the brand new 1.1 development branch
I've created for the large number of changes I'm currently making? 
I'll probably try to get the fix merged into both 1.0 and 1.1, and
then possibly release stable version 1.0.10 as a branch off the 1.0
line.  I've stabilised the 1.1 line so that any problems in the
relaxation curve fitting C modules I'm writing should not affect the
model-free code.


On 10 Jan 2006 17:54:57 +0000, Chris MacRaild <c.a.macraild@xxxxxxxxxxx> wrote:
> Hi,
>
> For a while now I have been using relax with a dataset where not all
> residues have the same same number of relaxation values (I have data at
> 3 fields, but the quality varies significantly, and some of the
> relaxation decay fits I am not happy with, so I have excluded the
> corresponding data). This has caused no obvious problems in most cases,
> except that results file io fails in some interesting ways. I've had a
> bit of a poke in the code to try and fix this, but it has raised a few
> issues that I thought I should run past you before I hack too hard.
>
> Of course when I load this data from the original data files, the
> relevant residue data structures all behave correctly -
> self.relax.data.res[runName][res].num_ri is correct for each residue, as
> are the residue specific remap_table, ri_labels, etc.
>
> When I do a results.write(), things change in the results file: all
> residues now have the same remap_table, etc, and missing Ri values and
> errors are given the value None. This in itself makes some sense in the
> context of the tabular file format, but is inconsistent with the way
> things are handled in the data structures above.
>
> Just in case I'm not being clear, an example. Lets say residue 1 has all
> 9 data points, but residue 2 is missing the first data point, so only
> has 8. On initially loading the data files, I have:
> self.relax.data.res[runName][1].num_ri = 9
> self.relax.data.res[runName][1].remap_table = [0,1,2,0,1,2,0,1,2]
> self.relax.data.res[runName][1].relax_data = [1.214, 0.896, 0.6817,
> 18.64, 20.48, 21.89, 0.769, 0.858, 0.893]
>
> self.relax.data.res[runName][2].num_ri = 8
> self.relax.data.res[runName][2].remap_table = [1,2,0,1,2,0,1,2]
> self.relax.data.res[runName][2].relax_data = [0.896, 0.6817, 18.64,
> 20.48, 21.89, 0.769, 0.858, 0.893]
>
> But the results file looks like:
> Res  ...    remap_table   ...     relax_data
> 1           [0,1,2,0,1,2,0,1,2]   [1.214, 0.896, 0.6817, 18.64, 20.48,
> 21.89, 0.769, 0.858, 0.893]
> 2           [0,1,2,0,1,2,0,1,2]   [None, 0.896, 0.6817, 18.64, 20.48,
> 21.89, 0.769, 0.858, 0.893]
>
>
> This inconsistency comes back to haunt us on results.read(). The first
> problem is that Ri values and errors get read in as None. This causes
> subsequent minimisation to fail (by some wonder of python's implicit
> type conversions, None < 0.0 evaluates as True so minimisation fails
> with negative errors!). The second problem relates to the fact that
> ri_labels, remap_table, etc are defined twice in the relax data
> structures, once at self.relax.data.res[runName][res].ri_labels, and
> again at self.relax.data.ri_labels[runName].
>
> read_columnar_relax_data() tries to set data.ri_labels[runName], etc.
> from the first line that is read from the results file. Then we have:
>
>     # Test if the relaxation data is consistent.
>     if self.ri_labels != eval(self.file_line[self.col['ri_labels']]) or
> self.remap_table != eval(self.file_line[self.col['remap_table']]) or
> self.frq_labels != eval(self.file_line[self.col['frq_labels']]) or
> self.frq != eval(self.file_line[self.col['frq']]):
>         raise RelaxError, "The relaxation data is not consistent for all
> residues."
>
> This checks that the ri_labels, etc, for each line of the results file
> is the same as that from the first line read. This exception is never
> raised, because these values are always consistent in the results file,
> even if they weren't in the original data, as I've outlined above. It
> does, however, seem to be an attempt to stop me using data where not all
> residues have the same same number of relaxation values. It might be
> that I'm missing something, but I can see no good reason why I should be
> stopped, and indeed using this type of data has caused me no other
> problems that I can see.
>
> So, the 'big issues' for your consideration:
>  - Is there any good reason for having ri_labels, etc defined at both
> the level of the run (at data.ri_labels[runName]) and at the level of
> the residue (at data.res[runName][res].ri_labels)? We have seen other
> bugs associated with these duplications, and it does seem to be asking
> for trouble.
>  - A decision needs to be made about which of these parameters are
> expected to be constant across all residues in a run, and which are
> potentially allowed to vary. Ideally, it seems to me, those which are
> constant across the run will be defined only at
> data.param_name[runName], and those that might vary from one residue to
> another will be at data.res[runName][res].param_name (and only there).
> Again, in my opinion, what ever convention is adopted in the internal
> data structures of relax should also be reflected in the way the
> parameter values are output to the results file.
>
> Anyway, let me know what you think.  I'm happy to have a hack at
> resolving these issues which ever way you decide.
>
>
> Chris
>
>
> _______________________________________________
> Relax-devel mailing list
> Relax-devel@xxxxxxx
> https://mail.gna.org/listinfo/relax-devel
>
Re: [Relax-devel] handling of relaxation data in results files (January 11, 2006 - 03:33)