The handling of relaxation data within relax - a redesign might be necessary! -- February 17, 2009

Hi,

I've been thinking about this issue for a while, but haven't had time
to write about it yet.  The issue is the handling of relaxation data
within relax and the only solution I can see is a redesign.  The
problem is that there is zero consistency between different analysis
types.  For example in model-free analysis and reduced spectral
density mapping, the data is stored as follows.  In the pipe
container, the following structures are created:

    <pipe desc="The contents of a relax data pipe" name="a" type="mf">
        <global desc="Global data located in the top level of the data pipe">
            <frq>
                [600000000.0, 800000000.0]
            </frq>
            <frq_labels>
                ['600', '800']
            </frq_labels>
            <noe_r1_table>
                [None, None, 0, None, None, 3]
            </noe_r1_table>
            <num_frq>
                2
            </num_frq>
            <num_ri>
                6
            </num_ri>
            <remap_table>
                [0, 0, 0, 1, 1, 1]
            </remap_table>
            <ri_labels>
                ['R1', 'R2', 'NOE', 'R1', 'R2', 'NOE']
            </ri_labels>
            <sim_number>
                3
            </sim_number>
            <sim_state>
                False
            </sim_state>
        </global>

This is from a XML results file and the relaxation data structures are
'frq', 'frq_labels', 'noe_r1_table', 'num_frq', 'num_ri',
'remap_table', and 'ri_labels'.  Most of these are simple Python
lists.  Within each spin container, these are for the most part
duplicated.  So for example:

                    <num_frq desc="Number of spectrometer frequencies">
                        2
                    </num_frq>
                    <frq desc="Frequencies">
                        [600000000.0, 800000000.0]
                    </frq>
                    <frq_labels desc="Frequency labels">
                        ['600', '800']
                    </frq_labels>
                    <num_ri desc="Number of relaxation data sets">
                        6
                    </num_ri>
                    <ri_labels desc="Relaxation data set labels">
                        ['R1', 'R2', 'NOE', 'R1', 'R2', 'NOE']
                    </ri_labels>
                    <remap_table desc="Table mapping frequencies to
relaxation data">
                        [0, 0, 0, 1, 1, 1]
                    </remap_table>
                    <noe_r1_table desc="Table mapping the NOE to the
corresponding R1">
                        [None, None, 0, None, None, 3]
                    </noe_r1_table>
                    <relax_data desc="The relaxation data">
                        [1.3500000000000001, 9.8499999999999996,
0.52300000000000002, 0.89000000000000001, 11.273, 0.59799999999999998]
                    </relax_data>
                    <relax_error desc="The relaxation data errors">
                        [0.049000000000000002, 0.64800000000000002,
0.041000000000000002, 0.037999999999999999, 0.80400000000000005,
0.044999999999999998]
                    </relax_error>

This organisation is a relic from about the time of relax 0.1 in 2001!
 For model-free analysis and reduced spectral density mapping, this
worked well.  This will also be fine for SRLS if this will eventually
be implemented.

But then for the R1 and R2 exponential curve fitting and the NOE
calculation, these structures no longer made sense.  So instead of
coming up with an elegant universal design, I instead used a different
design for these analysis types.  Peak intensities were stored using a
different design, although that is now under the control of the
generic_fns.spectrum module.  The relaxation data in this case were
treated as parameters of the model, hence the spin containers contain
the float or float list variables 'rx', 'i0', 'iinf', 'rx_err',
'i0_err', 'iinf_err', 'rx_sim', 'i0_sim', 'iinf_sim'.  This works ok,
but because it doesn't match the design above, there is no way of
copying data from a 'relax_fit' data pipe to a 'mf' data pipe.  As
part of a redesign, the 'rx' and 'rx_err' structures could possibly
be, at the end of optimisation, packaged into a generic relaxation
data structure identified through an ID tag.

Now the problem is with relaxation dispersion.  The peak intensity
handling is no issue as generic_fns.spectrum can handle all of this.
The problem is with R2eff.  The relaxation dispersion profiles consist
of many points of R2 relaxation data.  Then one of the fitted
parameters, the pure R2 without exchange, is also relaxation data.  So
how do we handle all of this?  The relax_data user function class
creates the structures detailed above, but I don't think these are
suitable or can be modified to fit this nicely.  What I would like to
find is a solution where all the relaxation data in all analysis
types, current and future, is handled identically.  So if R2
exponential curves are measured for relaxation dispersion, then the
'relax_fit' data pipe can be used for each dispersion profile point
and then all the relaxation data copied into the 'relax_disp' data
pipe.  Or if this is bypassed, the R2eff values can be calculated and
stored in the same way.  Finally, the fitted parameters for the pure
R2 rate can be packaged up into one of these relaxation data
structures (maybe as well as existing as parameters stored directly in
the spin containers).

I'm not exactly sure how we should design this yet.  But my current
ideas follow the design of what the peak intensity code does.  I.e.
the different relaxation data is identified by an ID string (used as
keys in dictionary objects).  There would be global structures in the
data pipe describing things such as spectrometer frequency, frequency
labels (for plotting), relaxation time point, relaxation data type
(R1, R2, NOE, R2eff, etc.), and everything else needed for model-free
analysis, spectral density mapping, NOE calculation, relaxation data
exponential curve-fitting, relaxation dispersion, and any other future
analysis type.  The structures in the spin containers will then be
Python dictionaries, like the global structures, with the keys set to
the ID strings.  Because of these ID strings and the global
structures, the only things needed in the spin container would be the
dictionaries 'relax_data' and 'relax_error' (possibly also 'relax_sim'
for the MC simulation part of the R1 and R2 curve-fitting).  All other
structures needed can be created on the fly using the global data and
the ID tags.

I still need to think about these ideas for a while.  Can anyone see
any issues with this type of a design?  These changes will obviously
need it's own branch, separate from Seb's relax_disp branch, as this
will break most parts of relax.  But implementing the changes will not
be too hard or take too long once a good design has been decided upon.

Regards,

Edward
The handling of relaxation data within relax - a redesign might be necessary!

Header

Content

Related Messages