mailRe: [Fwd: Re: [bug #6503] Uncaught nan in xh_vect]


Others Months | Index by Date | Thread Index
>>   [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Header


Content

Posted by Edward d'Auvergne on August 09, 2006 - 06:53:
> For 1) I would prefer the NaN catching to be outside of the
> 'minimise/' directory.  It should be safe to assume that that code
> will soon not be part of relax.  As for handling NaNs within the
> minimisation code I know of no other minimisation package that does
> this - if the user sends garbage to it then returning garbage should
> be expected.  The sender and receiver code should do the cleanup.  I
> do however think that testing for NaN during optimisation (in the
> 'maths_fns' code) is too computationally expensive.  If optimisation
> terminates in a reasonable time then I don't think we should test for
> NaNs during the number crunching phase.
>
We should check what the overhead is before we say too expensive

The number of times the family of functions 'self.func()' within the file 'maths_fns/mf.py' is called to generate the chi-squared value during optimisation is huge. When running relax, this function is the most called function in the entire code base. Putting the test for NaN within this function, or in the 'minimise/' code straight after this function is called, will be the most computationally expensive solution possible. A much more efficient design would be to catch the NaN just after optimisation has terminated as the test will only need to be done once per optimisation rather than thousands of times per optimisation.


> For 2) and 3) the NaN value comes from the chi2 variable which is just
> a standard Python floating point number rather than a Numeric
> construct.  Will the shift to Numpy actually change the behaviour of
> the default Python floats?  Or will it just change the behaviour of
> vectors, matrices, and other linear algebra?  Or is there a function
> similar to the fpconst.py function isNaN() which can be used to catch
> it?  Anyway, the 1.3 line is probably the best place to test the shift
> from Numeric to Numpy - although in a private branch first.
>
my undertanding is that in general numpy propogates nans generally and
that pure fp maths also propgtates nans. The only place there used to be
problems is in ufuncs which used not to propogate nans but raise
exceptions in numeric. There is a function similar to isNaN called
originally isnan (and isinf) in scipy...Ingenerale we could have a grep
for the use of isnan and isinf in the numpy/scipy codebase to see if
they are caught much or just propgated. A quick look in scipy/numpy
shows only a very few uses of isnan in numpy or scipy

I just had a look at scipy and the isnan function is defined in 'Lib/special/cephes/isnan.c'. They catch it based on the bit pattern, as you suggested previously, but dependent on whether it is an 'IBMPC', 'DEC', or 'MIEEE'. It should be pretty easy to implement a similar solution in relax using Python with not too many lines.


> As for the test suite, the optimisation code is completely untested.
> It's where the major breakages occur, although the code in
> 'maths_fns/' is problematic as well.  A shift to Numpy will require
> changes to both 'maths_fns/' and 'minimise/'.  To catch problems the
> four optimisation classes will need to be tested - standard single
> residue, diffusion tensor, all parameters (model-free + diffusion
> params), and the residue specific local tm models.  It shouldn't be
> too hard to code a number of tests for this as they can all use the
> same data.  Then all the optimisation algorithms in ALL combinations
> need to be tested - that is quite a few.  However as these minimisers
> will be separated from relax, this won't be so easy.
>
I don't quite follow why this won't be easy. The combinatorial feature
is of course a problem but I guess the likley combinations are the first
target.

It would be easy. I can use the data of Schurr et al., (1994), which I have reanalysed for a paper which is in preparation, for the test-suite. Then the tests should be as simple as writing a number of relax user scripts - although within the test-suite. However the tests for the optimisation algorithms probably shouldn't go into the relax test-suite - that code will eventually be removed from relax. Still for our purposes it might be good to have a second set of tests within the test-suite to test simultaneously both the model-free code and minimisation code.


> I believe though that throwing a RelaxError when NaNs occur is the
> best option.  That is because NaN should NEVER occur.  Even though it
> may cause a week long calculation to die at the very end, hence the
> optimisation was for nothing, an error should still be thrown (it's
> much more likely that a week long optimisation will die at the very
> start).  The reason for throwing a RelaxError and killing the
> calculation is simple.  Despite the calculation running until the end
> - it will need to be rerun.  If the NaN only occurs for a single
> residue the entire protein (the diffusion tensor) is nevertheless
> affected.


surely not if you skip data with nan values?

Do you really want to do this? The NaN value is a sign that something is fatally wrong.


I have to say at the end of the week long calculation I would like to
see the result. Thus for example (this came from chris) if a grid
search failed I would personally like the residue to be deselected ( and
maybe a warning generated on the console) and then the calculation
should go on.. In general i feel exceptions are a blunt tool for these
sort of problems as you lose the program state and don't get results on
where the calculation was going to for everything other that the faulty
data.

Sorry, my example of the week long calculation failing at the very end was a hypothetical which is probably impossible. The NaN value within model-free analysis is guaranteed to be caused by garbage input data, hence the RelaxError will be thrown before calculations really get under way. The example of the week long calculation was assuming that the new model-free optimisation protocol implemented in the sample script 'full_analysis.py' is run. That includes ~15 rounds of the iterative full optimisation of the system, 15 times for each of the spherical diffusion tensor, prolate spheroid, oblate spheroid, and ellipsoid. For each of these iterations many results files are generated. So even if the NaN and subsequent RelaxError occurs at the very end of the analysis - the results up to that point will be easily accessible. Optimisation can even continue from a point just before the error occurred. The amount of program state and computation time that is lost is relatively small.


Also for example if one MC caculation produced a nan and killed it all
that would be annoying in the extreme

By construction of Monte Carlo simulations I can't see this as being possible. If the NaN occurs in the MC simulation, it must have previously occurred in the original optimisation.


I can think of counter examples so for example an nan while calculating
a tensor should bring things to a close, but it would be nice to have a
default error handler that saved state to some meaningful place.

That would be useful - just difficult to code. If code is written which will dump the saved state in the current directory just prior to throwing the relax error (hint in the RelaxError base class BaseError in 'errors.py') it will certainly be accepted into relax.

Edward



Related Messages


Powered by MHonArc, Updated Wed Aug 09 18:00:36 2006