Re: [bug #6503] Uncaught nan in xh_vect -- August 08, 2006

On Tue, 2006-08-08 at 20:36 +1000, Edward d'Auvergne wrote:

For 1) I would prefer the NaN catching to be outside of the
'minimise/' directory.  It should be safe to assume that that code
will soon not be part of relax.  As for handling NaNs within the
minimisation code I know of no other minimisation package that does
this - if the user sends garbage to it then returning garbage should
be expected.  The sender and receiver code should do the cleanup.  I
do however think that testing for NaN during optimisation (in the
'maths_fns' code) is too computationally expensive.  If optimisation
terminates in a reasonable time then I don't think we should test for
NaNs during the number crunching phase.


I agree with all of this. NaN handling is the job of relax proper - not
the optimisation code. The only nuance I would put on it is that if a
grid search returns a NaN, we should catch it then and take the
appropriate action, rather than proceed to the next step of the
minimisation which will necessarily entail a lot of iterations waiting
for the impossible.

For 2) and 3) the NaN value comes from the chi2 variable which is just
a standard Python floating point number rather than a Numeric
construct.  Will the shift to Numpy actually change the behaviour of
the default Python floats?  Or will it just change the behaviour of
vectors, matrices, and other linear algebra?  Or is there a function
similar to the fpconst.py function isNaN() which can be used to catch
it?  Anyway, the 1.3 line is probably the best place to test the shift
from Numeric to Numpy - although in a private branch first.


I'll look into this option further and let you know.

I just tested it and in Python 2.1 NaN is apparently less than all
other numbers and is hence selected.  In 2.4 it's greater than all
other numbers and hence is never selected.  Therefore the model
selection code should try to catch the NaN.  But then what should we
do?  Throw a RelaxError?  Or skip these models, which brings the
notion of 'no selected model' into play and hence will require a large
rework of the code base to handle missing models?


Presumably no selected model is a possible outcome of model elimination?
Is it really not handled now? It seems to me that the way to handle it
is to deselect the effected residues and continue. I'm not quite sure
why it entails such a big change.

I believe though that throwing a RelaxError when NaNs occur is the
best option.  That is because NaN should NEVER occur.  Even though it
may cause a week long calculation to die at the very end, hence the
optimisation was for nothing, an error should still be thrown (it's
much more likely that a week long optimisation will die at the very
start).  The reason for throwing a RelaxError and killing the
calculation is simple.  Despite the calculation running until the end
- it will need to be rerun.  If the NaN only occurs for a single
residue the entire protein (the diffusion tensor) is nevertheless
affected.  This is because of the strong link between the diffusion
tensor parameters and the model-free parameters.  The values of one
influences the optimisation of the other and vice versa.  Therefore
the continuation of the calculation will, without doubt, produce
incorrect model-free results.


I dissagree here. There are many examples I can think of where the NaN
shouldn't mix with the diffusion tensor calculation. Just one example -
if only one MF model returns NaN, then it should not be selected and
will not influence the diffusion tensor. The other point is that the
propagation behaviour of NaNs is such that if a NaN were to influence
the diffusion tensor in any way, then the effected diffusion tensor
values will be NaN (clearly this is unrecoverable, and is an appropriate
place to throw an exception). 

Although I'd love to be able to agree with you that NaN should never
occur, floating point maths just isn't that cooperative. Even for
'correct' inputs, it is quite possible for a minimisation to drive a
value so small that a fp underflow occurs, then division or log of that
value will result in NaN (or INF - the two are equivalent for the
purposes of this discussion). I've never seen this with your
minimisation code, but I've certainly seen it in others (probably a
tribute to the robustness of your algorithms, but not grounds for too
much complacency)


To summarise my opinions:

To catch the NaN:  I think this is useful, though not necessary.
Avoiding fpconst.py as a dependency would be best.  If Numpy has a
function to catch Python native floating point values of NaN - then
migrating to Numpy is worth a go.  Otherwise migrating to Numpy isn't
an issue for this problem.


I believe catching NaN is necessary for defined performance of model
selection, and useful to avoid wasting an awful lot of time minimising
NaN. I think Numpy will be useful here.


What to do when NaNs occur:  RelaxError!


RelaxError is appropriate when the NaN signals an unrecoverable state,
eg. if the diffusion tensor contains NaN. On the other hand an isolated
NaN should result in the relevant model/residue being deselected and a
warning to highlight the fact. Obviously this more context dependent
response involves more work, but I don't think it needs to be fully
implimented all at once - as you rightly point out this is a rare
occurence.

Of course relax is yours, and I'm happy to recognise your benevolent
dictatorship 

Chris


Prevention vs. cure:  Well a mix.  Catch the NaN as a cure, then throw
a RelaxError.  The output can then be used to create prevention
measures.

Edward

Re: [bug #6503] Uncaught nan in xh_vect

Header

Content

Related Messages