> Because what I wrote above, it would probably have to be listed as a > dependency. However I don't really like that idea as dependencies are > a pain. Do you have the link to it's homepage? All the links I've > tried that look like it would be its homepage (for example > http://www.warnes.net/rwndown) are still down. Having it in-built > into relax would be better.
I agree that having an additional dependency is a bad idea. The options then are: 1) do nothing, and accept Pythons default NaN handling. Although I agree that NaNs are a rare occurence, I think this option is just putting off the inevitable - as more users push relax into more places, its inevitable that NaNs will appear again and give us more headaches, probably in ways we cant anticipate. 2) Code our own version of fpconst for relax. Then we could catch NaNs as and when we need to. 3) Make the move from Numeric to Numpy. As I understand it, this should give us both the ability to catch NaNs, but also give us reliable NaN comparisons (ie. NaN will compare False with everything). I've done some very preliminary testing towards this, and at first glance it seems easier than I expected. I have run the convertcode script that is distributed with Numpy on all modules in the maths directory of the relax 1.3 branch. All tests pass except the openDx tests, which also failed in 1.3 before the conversion. Obviously this will require a lot of care and more thorough testing than the current test-suite allows, but it atleast seems feasible. As we have discussed before, that shift to Numpy will be necessary soon enough anyway, so perhaps now is the time to give it a shot?
For 1) I would prefer the NaN catching to be outside of the 'minimise/' directory. It should be safe to assume that that code will soon not be part of relax. As for handling NaNs within the minimisation code I know of no other minimisation package that does this - if the user sends garbage to it then returning garbage should be expected. The sender and receiver code should do the cleanup. I do however think that testing for NaN during optimisation (in the 'maths_fns' code) is too computationally expensive. If optimisation terminates in a reasonable time then I don't think we should test for NaNs during the number crunching phase.
For 2) and 3) the NaN value comes from the chi2 variable which is just a standard Python floating point number rather than a Numeric construct. Will the shift to Numpy actually change the behaviour of the default Python floats? Or will it just change the behaviour of vectors, matrices, and other linear algebra? Or is there a function similar to the fpconst.py function isNaN() which can be used to catch it? Anyway, the 1.3 line is probably the best place to test the shift from Numeric to Numpy - although in a private branch first.
As for the test suite, the optimisation code is completely untested. It's where the major breakages occur, although the code in 'maths_fns/' is problematic as well. A shift to Numpy will require changes to both 'maths_fns/' and 'minimise/'. To catch problems the four optimisation classes will need to be tested - standard single residue, diffusion tensor, all parameters (model-free + diffusion params), and the residue specific local tm models. It shouldn't be too hard to code a number of tests for this as they can all use the same data. Then all the optimisation algorithms in ALL combinations need to be tested - that is quite a few. However as these minimisers will be separated from relax, this won't be so easy.
> However I still question whether the floating point value of NaN > actually needs to be caught at all - it would be useful, but is > unnecessary. It's a rare occurrence to begin with and I can easily > modify the optimisation component of relax to break the infinite > loops. Therefore does anything else need to be done? It will be > glaringly obvious in the final results file if NaNs get through the > analysis. And if there is an issue in relax's handling of NaN then > that should be classified as a relax bug. > > [snip]
> NaN should propagate - however I do see one point at which they will > disappear and that is during AIC model selection. The AIC value will > be NaN (NaN + 2k = NaN) and any model with a non-NaN AIC value will be > selected. Unless of course all model-free models for that residue are > affected. Yet if they do propagate the results for that residue will > make no sense.
Agreed, and that is one reason why we need to catch NaNs (or have properly defined comparisons on NaNs). In the current situation, if we do model selection on both NaN and finite values, the behaviour will be undefined. The options are to use a system that has defined NaN comparisons (Numpy seems to fit the bill) or to remove NaN values before model selection (another test in the model elimiation, perhaps?). Obviously this last option requires that we are able to catch NaNs in a defined way.
I just tested it and in Python 2.1 NaN is apparently less than all other numbers and is hence selected. In 2.4 it's greater than all other numbers and hence is never selected. Therefore the model selection code should try to catch the NaN. But then what should we do? Throw a RelaxError? Or skip these models, which brings the notion of 'no selected model' into play and hence will require a large rework of the code base to handle missing models? Whether the test is in the model elimination or model selection shouldn't matter, although the actual model selection methodology used may dictate what should happen with a NaN chi-squared value. And there may be analysis techniques added in the future which don't use model elimination (and end users may not have even heard about model elimination).
I believe though that throwing a RelaxError when NaNs occur is the best option. That is because NaN should NEVER occur. Even though it may cause a week long calculation to die at the very end, hence the optimisation was for nothing, an error should still be thrown (it's much more likely that a week long optimisation will die at the very start). The reason for throwing a RelaxError and killing the calculation is simple. Despite the calculation running until the end - it will need to be rerun. If the NaN only occurs for a single residue the entire protein (the diffusion tensor) is nevertheless affected. This is because of the strong link between the diffusion tensor parameters and the model-free parameters. The values of one influences the optimisation of the other and vice versa. Therefore the continuation of the calculation will, without doubt, produce incorrect model-free results.
Hence throwing the RelaxError will help the user to fix the issue. The user can either trace the error back themselves, email the problem, or report a bug. If an email or bug report is sent then we can modify relax to prevent the user error in the first place. If, although I don't know how, the NaN is generated by relax itself rather than user error then the user can send a bug report and we can fix the issue. In this situation I would insist that the bug is fixed before the user generates any results as the results could be complete garbage - there's just no way in knowing until we know how the NaN arose in the first place.
To summarise my opinions:
To catch the NaN: I think this is useful, though not necessary. Avoiding fpconst.py as a dependency would be best. If Numpy has a function to catch Python native floating point values of NaN - then migrating to Numpy is worth a go. Otherwise migrating to Numpy isn't an issue for this problem.
What to do when NaNs occur: RelaxError!
Prevention vs. cure: Well a mix. Catch the NaN as a cure, then throw a RelaxError. The output can then be used to create prevention measures.
Edward