On Tue, 2006-08-08 at 20:36 +1000, Edward d'Auvergne wrote:
For 1) I would prefer the NaN catching to be outside of the 'minimise/' directory. It should be safe to assume that that code will soon not be part of relax. As for handling NaNs within the minimisation code I know of no other minimisation package that does this - if the user sends garbage to it then returning garbage should be expected. The sender and receiver code should do the cleanup. I do however think that testing for NaN during optimisation (in the 'maths_fns' code) is too computationally expensive. If optimisation terminates in a reasonable time then I don't think we should test for NaNs during the number crunching phase.
I agree with all of this. NaN handling is the job of relax proper - not the optimisation code. The only nuance I would put on it is that if a grid search returns a NaN, we should catch it then and take the appropriate action, rather than proceed to the next step of the minimisation which will necessarily entail a lot of iterations waiting for the impossible.
For 2) and 3) the NaN value comes from the chi2 variable which is just a standard Python floating point number rather than a Numeric construct. Will the shift to Numpy actually change the behaviour of the default Python floats? Or will it just change the behaviour of vectors, matrices, and other linear algebra? Or is there a function similar to the fpconst.py function isNaN() which can be used to catch it? Anyway, the 1.3 line is probably the best place to test the shift from Numeric to Numpy - although in a private branch first.
I'll look into this option further and let you know.
I just tested it and in Python 2.1 NaN is apparently less than all other numbers and is hence selected. In 2.4 it's greater than all other numbers and hence is never selected. Therefore the model selection code should try to catch the NaN. But then what should we do? Throw a RelaxError? Or skip these models, which brings the notion of 'no selected model' into play and hence will require a large rework of the code base to handle missing models?
Presumably no selected model is a possible outcome of model elimination? Is it really not handled now? It seems to me that the way to handle it is to deselect the effected residues and continue. I'm not quite sure why it entails such a big change.
I believe though that throwing a RelaxError when NaNs occur is the best option. That is because NaN should NEVER occur. Even though it may cause a week long calculation to die at the very end, hence the optimisation was for nothing, an error should still be thrown (it's much more likely that a week long optimisation will die at the very start). The reason for throwing a RelaxError and killing the calculation is simple. Despite the calculation running until the end - it will need to be rerun. If the NaN only occurs for a single residue the entire protein (the diffusion tensor) is nevertheless affected. This is because of the strong link between the diffusion tensor parameters and the model-free parameters. The values of one influences the optimisation of the other and vice versa. Therefore the continuation of the calculation will, without doubt, produce incorrect model-free results.
I dissagree here. There are many examples I can think of where the NaN shouldn't mix with the diffusion tensor calculation. Just one example - if only one MF model returns NaN, then it should not be selected and will not influence the diffusion tensor. The other point is that the propagation behaviour of NaNs is such that if a NaN were to influence the diffusion tensor in any way, then the effected diffusion tensor values will be NaN (clearly this is unrecoverable, and is an appropriate place to throw an exception). Although I'd love to be able to agree with you that NaN should never occur, floating point maths just isn't that cooperative. Even for 'correct' inputs, it is quite possible for a minimisation to drive a value so small that a fp underflow occurs, then division or log of that value will result in NaN (or INF - the two are equivalent for the purposes of this discussion). I've never seen this with your minimisation code, but I've certainly seen it in others (probably a tribute to the robustness of your algorithms, but not grounds for too much complacency)
To summarise my opinions: To catch the NaN: I think this is useful, though not necessary. Avoiding fpconst.py as a dependency would be best. If Numpy has a function to catch Python native floating point values of NaN - then migrating to Numpy is worth a go. Otherwise migrating to Numpy isn't an issue for this problem.
I believe catching NaN is necessary for defined performance of model selection, and useful to avoid wasting an awful lot of time minimising NaN. I think Numpy will be useful here.
What to do when NaNs occur: RelaxError!
RelaxError is appropriate when the NaN signals an unrecoverable state, eg. if the diffusion tensor contains NaN. On the other hand an isolated NaN should result in the relevant model/residue being deselected and a warning to highlight the fact. Obviously this more context dependent response involves more work, but I don't think it needs to be fully implimented all at once - as you rightly point out this is a rare occurence. Of course relax is yours, and I'm happy to recognise your benevolent dictatorship Chris
Prevention vs. cure: Well a mix. Catch the NaN as a cure, then throw a RelaxError. The output can then be used to create prevention measures. Edward