[Relax-devel] handling of relaxation data in results files -- January 10, 2006

Hi,

For a while now I have been using relax with a dataset where not all
residues have the same same number of relaxation values (I have data at
3 fields, but the quality varies significantly, and some of the
relaxation decay fits I am not happy with, so I have excluded the
corresponding data). This has caused no obvious problems in most cases,
except that results file io fails in some interesting ways. I've had a
bit of a poke in the code to try and fix this, but it has raised a few
issues that I thought I should run past you before I hack too hard.

Of course when I load this data from the original data files, the
relevant residue data structures all behave correctly -
self.relax.data.res[runName][res].num_ri is correct for each residue, as
are the residue specific remap_table, ri_labels, etc.

When I do a results.write(), things change in the results file: all
residues now have the same remap_table, etc, and missing Ri values and
errors are given the value None. This in itself makes some sense in the
context of the tabular file format, but is inconsistent with the way
things are handled in the data structures above.

Just in case I'm not being clear, an example. Lets say residue 1 has all
9 data points, but residue 2 is missing the first data point, so only
has 8. On initially loading the data files, I have:
self.relax.data.res[runName][1].num_ri = 9
self.relax.data.res[runName][1].remap_table = [0,1,2,0,1,2,0,1,2]
self.relax.data.res[runName][1].relax_data = [1.214, 0.896, 0.6817,
18.64, 20.48, 21.89, 0.769, 0.858, 0.893]

self.relax.data.res[runName][2].num_ri = 8
self.relax.data.res[runName][2].remap_table = [1,2,0,1,2,0,1,2]
self.relax.data.res[runName][2].relax_data = [0.896, 0.6817, 18.64,
20.48, 21.89, 0.769, 0.858, 0.893]

But the results file looks like:
Res  ...    remap_table   ...     relax_data
1           [0,1,2,0,1,2,0,1,2]   [1.214, 0.896, 0.6817, 18.64, 20.48,
21.89, 0.769, 0.858, 0.893]
2           [0,1,2,0,1,2,0,1,2]   [None, 0.896, 0.6817, 18.64, 20.48,
21.89, 0.769, 0.858, 0.893]


This inconsistency comes back to haunt us on results.read(). The first
problem is that Ri values and errors get read in as None. This causes
subsequent minimisation to fail (by some wonder of python's implicit
type conversions, None < 0.0 evaluates as True so minimisation fails
with negative errors!). The second problem relates to the fact that
ri_labels, remap_table, etc are defined twice in the relax data
structures, once at self.relax.data.res[runName][res].ri_labels, and
again at self.relax.data.ri_labels[runName]. 

read_columnar_relax_data() tries to set data.ri_labels[runName], etc.
from the first line that is read from the results file. Then we have:

    # Test if the relaxation data is consistent.
    if self.ri_labels != eval(self.file_line[self.col['ri_labels']]) or
self.remap_table != eval(self.file_line[self.col['remap_table']]) or
self.frq_labels != eval(self.file_line[self.col['frq_labels']]) or
self.frq != eval(self.file_line[self.col['frq']]):
        raise RelaxError, "The relaxation data is not consistent for all
residues."

This checks that the ri_labels, etc, for each line of the results file
is the same as that from the first line read. This exception is never
raised, because these values are always consistent in the results file,
even if they weren't in the original data, as I've outlined above. It
does, however, seem to be an attempt to stop me using data where not all
residues have the same same number of relaxation values. It might be
that I'm missing something, but I can see no good reason why I should be
stopped, and indeed using this type of data has caused me no other
problems that I can see.

So, the 'big issues' for your consideration:
 - Is there any good reason for having ri_labels, etc defined at both
the level of the run (at data.ri_labels[runName]) and at the level of
the residue (at data.res[runName][res].ri_labels)? We have seen other
bugs associated with these duplications, and it does seem to be asking
for trouble.
 - A decision needs to be made about which of these parameters are
expected to be constant across all residues in a run, and which are
potentially allowed to vary. Ideally, it seems to me, those which are
constant across the run will be defined only at
data.param_name[runName], and those that might vary from one residue to
another will be at data.res[runName][res].param_name (and only there).
Again, in my opinion, what ever convention is adopted in the internal
data structures of relax should also be reflected in the way the
parameter values are output to the results file.

Anyway, let me know what you think.  I'm happy to have a hack at
resolving these issues which ever way you decide.


Chris
[Relax-devel] handling of relaxation data in results files (January 10, 2006 - 17:53)