Re: [bug #6503] Uncaught nan in xh_vect -- August 04, 2006

On Fri, 2006-08-04 at 23:46 +1000, Edward d'Auvergne wrote:

there is an alternative 3 which would be to back port the
functionality
from numpy in the expectation that 'one of these days' numeric will
be
replaced by numpy (is numeric still maintained?)

also it would be possible to check whether nans etc produce the
expected
result at the start of the script and  warn the user if they had a
non
compliant result.

A final note is that i believe scipy propogates nan infs etc properly
whereas numeric doen't for ufuncs and some other cases...
http://cens.ioc.ee/~pearu/scipy/tutorial.pdf


As I understand, Numeric is no longer maintained, and Numpy is its
appointed successor, so in that sense porting to Numpy will be
neccessary some time soon anyway. In principle it should be a fairly
trivial process - changing the import statements being the major job.
The thing that makes things more complicated, however, is that
Scientific is still reliant on Numeric and not 'yet' compatible with
Numpy. So if we do move over to Numpy, an alternative PDB parser will be
required (and another MPI interface?).


Yep, Numeric is essentially dead and they do recommend to switch
everything to Numpy "users should transisition [sic] to NumPy as
quickly as possible".  However Numpy is so incredibly broken it
shouldn't even be called alpha software.  I tried it out in relax and
everything broke.  This was especially evident in the minimisation
code where many linear algebra function calls are made.  They should
have at least have made a functional product before telling everyone
to switch to it.  They may have stabilised things by now but the fact
that you have to pay to read the manual is telling.  I have a feeling
there will be a significant speed hit as well.  A good indication that
NumPy is worth switching to might be when Scientific Python switches
to it.  However feel free to branch the 1.3 line if you want to give
it a go, it was a few years ago that I tried and failed.


The whole Numeric/Numarray/Numpy thing has been a huge mess for a while,
but I get the feeling that things are settling around Numpy now. It
might well be time to give it another go (though as you suggest, waiting
for Scientific to move might be a good option too)

The issue with Numeric's handling of NaN and INFs is that often it will
raise a FloatingPointError of one flavour or another, rather than
returning NaN or INF. Divide by zero is the clasic example, but there
are many others. This is also true of Python's math functions, and again
is fairly platform dependent. The propagation of NaNs is OK though, i
think. That is, NaN is always the result of any math operation on NaN.


Those examples you gave before Chris were a bit crazy.  I don't think
it will be physically possible to ever catch NaNs while not
restricting relax to one very specific version of Python - that is
until they fix their floating point operations.  Maths operations on
NaN, that by definition surely must be an error?  As NaN is not
properly handled in Python (and is handled differently in different
version) catching all places in the program where they arise and
preventing them in the first place would be good.  I have a feeling
that the initial chi-squared value I arbitrarily set to 1e300 might be
the source.  Maybe using inf in that spot will fix the bug, or maybe
another method would be better.


Maths operations on NaNs shouldn't be an error, but they should *always*
return NaN (Python gets this much right, but only because all the
platforms it runs on gets it right, and Python blindly inherits this
behaviour). Comparison operations on NaN also shouldn't be an error, but
they should *always* return False (this Python often gets wrong, because
the platforms it runs on gets in wrong, and Python blindly inherits...).

To understand the 'correct' behaviour, its worth understanding why NaN
exists: consider a big calculation involving lots of input data, giving
many different results (a Model-Free calculation, eg). Suppose one or
two of the inputs are wrong (I should be so lucky). Suppose that because
of the bad inputs, somewhere in the calculation an undefined
mathematical operation occurs (say divide by zero, or log(0), eg). There
are two options: (1) either raise an exception and bring the whole
calculation to a halt, or (2) indicate that the result of the bad
operation is undefined, but continue the calculation on the rest of the
data. 

Clearly if I have spent hours calculating good results from good data, I
dont want to thow all of that away just because one operation is bad.
What I do want, is that all the results that are contaminated by the bad
operation are clearly identifiable. This is the purpose of NaN - it
marks out the result of an undefined floating-point operation, and
propagates itself into all results that depend on it. It should do this
without throwing exceptions or otherwise halting whatever other
calculations might be going on independently within the same process.

The most important bit of all of this is that NaNs always propagate ie.
f(NaN) = NaN for all maths functions. Python gets this much right.
Python doesn't get NaN creation right - in many instances when an
undefined operation occurs, Python throws one of its bizare array of
FloatingPointErrors, rather than just returning NaN or INF. Python also
makes testing for NaNs very difficult, because all the comparison
behaviour is undefined. Here I have been a bit harsh on Python in my
previous posts - as of Python 2.4 comarisons that use the x==y or x!=y
syntax behave as they should, at least on Windows and gcc based
platforms. As I demonstrated, though, using cmp() leads to all manner of
chaos!


I will also look into placing an upper iteration limit on the
backtracking line search to prevent it from hanging.


The other possibility is a test to ensure the line search is actually
going somewhere, but that could be expensive in terms of performance.

Chris, if you
set the 'print_flag' value to say 5, does relax just sit there doing
nothing or does the line search continue forever?  If it hangs without
printing anything it could be a bug in Numeric.  In version 24.x the
eigenvalue function freezes and ctrl-c does nothing (try the
eigenvalue Hessian modification together with Newton minimisation to
see the result).  You have to manually 'kill' relax.


This is really bizare - if I do the minimisation with print_flag<=3, the
minimisation gets stuck in a loop, but ctrl-c works. If print_flag>3,
ctrl-c doesn't work, and I have to kill relax. Ctrl-c not woking
possibly indicates that we are stuck somewhere in the C world (in
Numeric eg.) and so the signal never gets out to the python world. But
why does the print_flag level make a difference?

Chris

Re: [bug #6503] Uncaught nan in xh_vect

Header

Content

Related Messages