Hi, The problem is likely to be due to a circular looping around very similar solutions close to the universal solution, as I have defined in: d'Auvergne EJ, Gooley PR. Set theory formulation of the model-free problem and the diffusion seeded model-free paradigm. Mol Biosyst. 2007 Jul;3(7):483-94. If you can't get the paper, have a look at Chapter 5 of my PhD thesis at http://eprints.infodiv.unimelb.edu.au/archive/00002799/. The problem is the interlinked mathematical optimisation and statistical model selection where we are trying to minimise different quantities. For mathematical optimisation this is the chi-squared value. For model selection this is the quantity known as the descrepancy. These can not be optimised together as mathematical optimisation works in a single fixed-dimension space or universe whereas model selection operates across multiple spaces with different dimensions. See the paper for a more comprehensive description of the issue. You should be able to see this if you look at the end of iterations. If you have 160 iterations, look after iteration 20 (or maybe even 30 or further). Until then, you will not have reached the circular loop. After that point you will be able to exactly quatify this circular loop. You'll be able to determine its periodicity, which residues are involved (probably only 1), and whether the diffusion tensor changes as model selection changes. I mentioned all of this already in my post at https://mail.gna.org/public/relax-users/2007-07/msg00001.html (Message-id: <7f080ed10707090104r60453839of0b9df58022501d2@xxxxxxxxxxxxxx>) in response to your original post (https://mail.gna.org/public/relax-users/2007-06/msg00004.html, Message-id: <2ED509BE-FA07-4ED0-96F0-E4E612405DC7@xxxxxxxxx>). I have a few more points about the tests you have done but to work out what is happening with the printouts, it would be very useful to have your modified 'full_analysis.py' script attached. On 9/17/07, Douglas Kojetin <douglas.kojetin@xxxxxxxxx> wrote:
Hi, I'm unsure if this is a bug in full_analysis.py, in the internal relax code, or user error. The optimization of the 'sphere' model will not converge, now after 160+ rounds. The chi-squared test has converged (long, long ago):
See above.
"" from output Chi-squared test: chi2 (k-1): 100.77647517006251 chi2 (k): 100.77647517006251 The chi-squared value has converged. "" However, the identical model-free models test does has not converged: "" from output Identical model-free models test: The model-free models have not converged. Identical parameter test: The model-free models haven't converged hence the parameters haven't converged. "" Something that confuses me is that the output files in the round_??/ aic directory suggest that, for example, the round_160 and round_161 AIC model selections are equivalent. Here are the models for the first few residues:
Between these 2 rounds, are you sure that all models for all residues are identical? From your data that you posted at https://mail.gna.org/public/relax-users/2007-06/msg00017.html (Message-id: <B52B2F07-8B2C-4FAE-B270-902C2A1B3619@xxxxxxxxx>), I would guess that this is not the case and one or two residues actually do change in their model selections.
"" 1 None None 2 None None 3 None None 4 m2 m2 5 m2 m2 6 m2 m2 7 m2 m2 8 m2 m2 9 m4 m4 10 m1 m1 11 None None 12 m2 m2 13 m2 m2 14 m1 m1 15 m2 m2 16 m3 m3 17 m3 m3 18 None None "" However, I modified the full_analysis.py protocol to print the differences in the model selection, within the 'Identical model-free model test' section of the 'convergence' definition. Here is the beginning of the output (which only contains differences between the previous and current rounds): "" residue 1: prev=None curr=m2 residue 2: prev=None curr=m2 residue 3: prev=None curr=m2 residue 6: prev=m2 curr=m4 residue 7: prev=m2 curr=m1 residue 9: prev=m4 curr=m2 residue 11: prev=None curr=m2 residue 12: prev=m2 curr=m3 residue 13: prev=m2 curr=m3 residue 15: prev=m2 curr=m1 residue 16: prev=m3 curr=m2 residue 17: prev=m3 curr=m1 residue 18: prev=None curr=m3 ""
This output is quite strange. I would need to see the full_analysis.py script to do more with this.
There should be no data for residues 1-3, 11 and 18 (None), however the 'Identical model-free model test' seems as if it ignores residues for which 'None' was selected in the curr_model call in the following code: "" # Create a string representation of the model-free models of the previous run. prev_models = '' for i in xrange(len(self.relax.data.res['previous'])): if hasattr(self.relax.data.res['previous'][i], 'model'): #prev_models = prev_models + self.relax.data.res ['previous'][i].model prev_models = prev_models + ' ' + self.relax.data.res ['previous'][i].model # Create a string representation of the model-free models of the current run. curr_models = '' for i in xrange(len(self.relax.data.res[run])): if hasattr(self.relax.data.res[run][i], 'model'): #curr_models = curr_models + self.relax.data.res[run] [i].model curr_models = curr_models + ' ' + self.relax.data.res [run][i].model ""
As residues 1-3, 11 and 18 are deselected, then they will not have the attribute 'model' and hence will not be placed in the prev_models or curr_models string (which are then compared).
For what it's worth, I have residues 1,2,3,11 and 18 in the file 'unresolved' which is read by the full_analysis.py protocol. I created a separate sequence file (variable = SEQUENCE) that contains all residues (those with data and those without), instead of using a data file (noe data, in the default full_analysis.py file). However, these residues are not specified in the data (r1, r2 and noe) files, as I did not have data for them. Should I add them but place 'None' in the data and error columns? Could that be causing the problems? Or should I create a bug report for this?
I'm not sure what you mean by the file '(variable = SEQUENCE)' statement. I would need to see the full_analysis.py script to understand this. I would assume a file called the value in the variable 'SEQUENCE'. In which case, this should not be a problem. As the data is missing from the files containing the relaxation data, these residues will not be used in model-free analysis. They will be automatically deselected. There is not need to add empty data for these spin systems into the relaxation data files. As I said before at the top of this message and at https://mail.gna.org/public/relax-users/2007-07/msg00001.html (Message-id: <7f080ed10707090104r60453839of0b9df58022501d2@xxxxxxxxxxxxxx>), the problem is almost guaranteed to be a circular loop of equivalent solutions circling around the universal solution - the solution defined within the universal set (the union of all global models (diffusion tensor + all model-free models of all residues)). If you have hit this circular problem, then I suggest you stop running relax. What I would then do is identify the spin system (or systems) causing the loop, and what the differences are between all members of the loop. This you can do by plotting the data and maybe using the program diff on uncompressed versions of the results files. It's likely that the differences are small and inconsequential. I hope this helps. Regards, Edward