Re: [Fwd: Re: [bug #6503] Uncaught nan in xh_vect] -- August 09, 2006

> For 1) I would prefer the NaN catching to be outside of the
> 'minimise/' directory.  It should be safe to assume that that code
> will soon not be part of relax.  As for handling NaNs within the
> minimisation code I know of no other minimisation package that does
> this - if the user sends garbage to it then returning garbage should
> be expected.  The sender and receiver code should do the cleanup.  I
> do however think that testing for NaN during optimisation (in the
> 'maths_fns' code) is too computationally expensive.  If optimisation
> terminates in a reasonable time then I don't think we should test for
> NaNs during the number crunching phase.
>
We should check what the overhead is before we say too expensive


The number of times the family of functions 'self.func()' within the
file 'maths_fns/mf.py' is called to generate the chi-squared value
during optimisation is huge.  When running relax, this function is the
most called function in the entire code base.  Putting the test for
NaN within this function, or in the 'minimise/' code straight after
this function is called, will be the most computationally expensive
solution possible.  A much more efficient design would be to catch the
NaN just after optimisation has terminated as the test will only need
to be done once per optimisation rather than thousands of times per
optimisation.

> For 2) and 3) the NaN value comes from the chi2 variable which is just
> a standard Python floating point number rather than a Numeric
> construct.  Will the shift to Numpy actually change the behaviour of
> the default Python floats?  Or will it just change the behaviour of
> vectors, matrices, and other linear algebra?  Or is there a function
> similar to the fpconst.py function isNaN() which can be used to catch
> it?  Anyway, the 1.3 line is probably the best place to test the shift
> from Numeric to Numpy - although in a private branch first.
>
my undertanding is that in general numpy propogates nans generally and
that pure fp maths also propgtates nans. The only place there used to be
problems is in ufuncs which used not to propogate nans but raise
exceptions in numeric. There is a function similar to isNaN called
originally isnan (and isinf) in scipy...Ingenerale we could have a grep
for the use of isnan and isinf in the numpy/scipy codebase to see if
they are caught much or just propgated. A quick look in scipy/numpy
shows only a very few uses of isnan in numpy or scipy


I just had a look at scipy and the isnan function is defined in
'Lib/special/cephes/isnan.c'.  They catch it based on the bit pattern,
as you suggested previously, but dependent on whether it is an
'IBMPC', 'DEC', or 'MIEEE'.  It should be pretty easy to implement a
similar solution in relax using Python with not too many lines.

> As for the test suite, the optimisation code is completely untested.
> It's where the major breakages occur, although the code in
> 'maths_fns/' is problematic as well.  A shift to Numpy will require
> changes to both 'maths_fns/' and 'minimise/'.  To catch problems the
> four optimisation classes will need to be tested - standard single
> residue, diffusion tensor, all parameters (model-free + diffusion
> params), and the residue specific local tm models.  It shouldn't be
> too hard to code a number of tests for this as they can all use the
> same data.  Then all the optimisation algorithms in ALL combinations
> need to be tested - that is quite a few.  However as these minimisers
> will be separated from relax, this won't be so easy.
>
I don't quite follow why this won't be easy. The combinatorial feature
is of course a problem but I guess the likley combinations are the first
target.


It would be easy.  I can use the data of Schurr et al., (1994), which
I have reanalysed for a paper which is in preparation, for the
test-suite.  Then the tests should be as simple as writing a number of
relax user scripts - although within the test-suite.  However the
tests for the optimisation algorithms probably shouldn't go into the
relax test-suite - that code will eventually be removed from relax.
Still for our purposes it might be good to have a second set of tests
within the test-suite to test simultaneously both the model-free code
and minimisation code.

> I believe though that throwing a RelaxError when NaNs occur is the
> best option.  That is because NaN should NEVER occur.  Even though it
> may cause a week long calculation to die at the very end, hence the
> optimisation was for nothing, an error should still be thrown (it's
> much more likely that a week long optimisation will die at the very
> start).  The reason for throwing a RelaxError and killing the
> calculation is simple.  Despite the calculation running until the end
> - it will need to be rerun.  If the NaN only occurs for a single
> residue the entire protein (the diffusion tensor) is nevertheless
> affected.

surely not if you skip data with nan values?


Do you really want to do this?  The NaN value is a sign that something
is fatally wrong.

I have to say at the end of the week long calculation I would like to
see the result. Thus for example (this came from chris) if a grid
search failed I would personally like the residue to be deselected ( and
maybe a warning generated on the console) and then the calculation
should go on.. In general i feel exceptions are a blunt tool for these
sort of problems as you lose the program state and don't get results on
where the calculation was going to for everything other that the faulty
data.


Sorry, my example of the week long calculation failing at the very end
was a hypothetical which is probably impossible.  The NaN value within
model-free analysis is guaranteed to be caused by garbage input data,
hence the RelaxError will be thrown before calculations really get
under way.  The example of the week long calculation was assuming that
the new model-free optimisation protocol implemented in the sample
script 'full_analysis.py' is run.  That includes ~15 rounds of the
iterative full optimisation of the system, 15 times for each of the
spherical diffusion tensor, prolate spheroid, oblate spheroid, and
ellipsoid.  For each of these iterations many results files are
generated.  So even if the NaN and subsequent RelaxError occurs at the
very end of the analysis - the results up to that point will be easily
accessible.  Optimisation can even continue from a point just before
the error occurred.  The amount of program state and computation time
that is lost is relatively small.

Also for example if one MC caculation produced a nan and killed it all
that would be annoying in the extreme


By construction of Monte Carlo simulations I can't see this as being
possible.  If the NaN occurs in the MC simulation, it must have
previously occurred in the original optimisation.

I can think of counter examples so for example an nan while calculating
a tensor should bring things to a close, but it would be nice to have a
default error handler that saved state to some meaningful place.


That would be useful - just difficult to code.  If code is written
which will dump the saved state in the current directory just prior to
throwing the relax error (hint in the RelaxError base class BaseError
in 'errors.py') it will certainly be accepted into relax.

Edward

Re: [Fwd: Re: [bug #6503] Uncaught nan in xh_vect]

Header

Content

Related Messages