auto_select branch created -- March 16, 2006

I have created a new branch for work on the auto_select modification.
Feel free to revert the 1.2 branch.

I'll try to commit some time over the next few days to fixing and more
thorough testing.

Chris


On Fri, 2006-03-17 at 01:09 +1100, Edward d'Auvergne wrote:

Actually, I done a bit of testing and I think there could be some
issues with the fix.  The number of breakages are quite high, so if
you could create a branch for these modification Chris, it would be
appreciated.  If you make the branch, tomorrow I'll revert the
changes.  Once the fix is stabilised and tested, we can merge it back
into the main line.

A few tests in which breakages occur are when running the relaxation
curve-fitting on the data attached to bug #5473.  There are a number
of issue which arise in that test.  Possibly the most critical is that
relax results files cannot be read (well no data is extracted).  Also,
as selection is dependent on the relaxation data being there, all
residues are unselected until the relaxation data is loaded.  This
will have implications in a number of areas.  I still have to test
what happens during model selection.

Another approach might be to implement the deselection when n < k
(less data than parameters) within the model-free specific code only
when model selection occurs.  A function such as
'self.overfit_deselect()' either in the specific model-free code or
the generic selection code could be used.  What do you think?

Edward


On 3/16/06, Edward d'Auvergne <edward.dauvergne@xxxxxxxxx> wrote:

Thank you very much for the fix.  I like the idea of the user_select
and auto_select and the automatic generation of the 'select' object.
However, we will need to modify a few other functions and also work
out how to communicate to a user what will happen.  The concept of
explicitly preventing over-fitting and catching the lack of data is
very good.  There are also a few coding conventions I haven't
mentioned in the development chapter but which would keep the code
more consistent.


# Possible bugs.

Firstly, the following functions explicitly change the selection state
by using the call 'select = [0 or 1]':

'generic_fns/eliminate.py'
'specific_fns/model_free.py'
'specific_fns/relax_data.py'
'specific_fns/relax_fit.py'

These will likely cause problems.  The first file I think will be
handled appropriately by your changes.  The last three will probably
need to have changes made to them.  For example
'specific_fns/relax_fit.py' deselects the residue if there are no peak
intensities loaded into relax for that residue.  Grepping for
'select', which is what I did to find these files, should locate any
other possible future problems.

I think there should also be a test of the run type.  Relaxation
curve-fitting, the NOE calculation, reduced spectral density mapping
(RSDM), and any future run types will not have the residue specific
structure 'relax_data'.  With the current changes, I believe that
these run types will not be available.  I tested it on the relaxation
curve-fitting data attached to bug #5473 and the code fails.  It
thinks that the total number of residues is zero and then tries to
calculate the mean by dividing by zero.  In this case, run type
'relax_fit', the 'auto_select' function should test against the
structure 'intensities' instead.  A default behaviour for all other
runs could be that auto_select returns 1.  I think RSDM also uses the
'relax_data' structure, so maybe that should be tested.


# The end user.

The automatic prevention of over-fitting is good, except how do we
communicate that to a user?  And what should happen if there is a
residue which only has one data point (n = 1)?  Should we allow the
user to blindly minimise the model-free models m0 and m1 and leave it
to the user to appropriately create an 'unselected' file and include
that residue?  Or should we set a minimum of maybe n > 2?  If the user
is not wary, the use of residues with low n will significantly skew
the diffusion tensor.


# Coding convention.

These points are minor but affect the appearance of the code.  Chris,
if you subscribe to the relax-commits list, I can fix the formatting
and you'll get an email with the diff of the changes.  Firstly is
relax's use of camel case vs. all lower case with underscores.  For
classes, relax uses a mix of camel case (eg. RelaxError) and
underscores (eg Model_free).  The first letter is, however, always
capitalised.  For function names, lower case with underscores between
words is always used.  This is for readability as the convention is
much more fluent than camel case.  A description of what was done to
fix the bug would also have been useful in the commit message, eg the
description in your email would be perfect.

Thanks again Chris, your idea is inventive and very powerful.  Bye,

Edward


On 3/16/06, Chris MacRaild <c.a.macraild@xxxxxxxxxxx> wrote:

A fix has been comitted to the relax svn repository.

This impliments two internal select parameters. The first, userSelect,
is controlled by the user through select.res() and related commands. The
second, autoSelect, is automatically set to 0 if the number of
parameters exceeds the number of data points for that residue, and 1
otherwise. The overall selection,
self.relax.data.res[run][index].select, is given by (userSelect AND
autoSelect)

Because autoSelect varies in complex ways as the program state changes,
it is evaluated on-the-fly whenever the selection state of the residue
is queried.

Any attempt to bind self.relax.data.res[run][index].select or
self.relax.data.res[run][index].autoSelect will raise an AttributeError


Chris



On Thu, 2006-03-16 at 00:08 +1100, Edward d'Auvergne wrote:

The deselection bug described in #5501 was originally designed as a
feature specifically to prevent residues from having less relaxation
data points than parameters in the model-free models.  If the
behaviour is changed, we will need to work out how to handle less data
than parameters.  For example, what should relax do if a residue with
only one data point is encountered?  If 6 data points are collected
but two are missing, what should happen with model m8?  How can the
final behaviour be made so that it is obvious to the end user what
will actaully happen in any data vs. parameter combination?

Edward



On 3/15/06, Chris MacRaild <NO-REPLY.INVALID-ADDRESS@xxxxxxx> wrote:


URL:
  <http://gna.org/bugs/?func=detailitem&item_id=5501>

                 Summary: residue deselection problem on 
relax_data.read()
                 Project: relax
            Submitted by: macraild
            Submitted on: Wednesday 03/15/2006 at 09:48
                Category: None
                Priority: 5 - Normal
                Severity: 3 - Normal
                  Status: In Progress
                 Privacy: Public
             Assigned to: macraild
        Originator Email:
             Open/Closed: Open

    _______________________________________________________

Details:

Currently, relax_data.read() tests all residues for whether or not 
they
contain data after the data file hes been read. Those without data 
are
deselected.

This causes all residues which lack data in the first loaded file 
to be
deselected even after loading new data for that residue.

An immediate work-around is to explicitly select resdiues as 
neccesary.






    _______________________________________________________

Reply to this item at:

  <http://gna.org/bugs/?func=detailitem&item_id=5501>

_______________________________________________
  Message sent via/by Gna!
  http://gna.org/


_______________________________________________
Relax-devel mailing list
Relax-devel@xxxxxxx

To unsubscribe from this list, get a password reminder, or change 
your subscription options, visit the list information page at 
https://mail.gna.org/listinfo/relax-devel



_______________________________________________
Relax-devel mailing list
Relax-devel@xxxxxxx

To unsubscribe from this list, get a password reminder, or change your 
subscription options, visit the list information page at 
https://mail.gna.org/listinfo/relax-devel

auto_select branch created

Header

Content

Related Messages