Hi, Sorry for the delay, I am flat out preparing for the ENC conference at the moment. Something is a little strange, but unfortunately I don't know where the source of the problem is. It is worth looking at the diffusion tensor parameters and the final results for each of the analyses - the model and optimisation differences may not be statistically significant. Maybe the differences are simply from the truncation of coordinates in a PDB file. If you can track down the source of the difference (whether in relax or elsewhere), would you be able to report this information? It will be very useful. Sorry I can't be of more help. Thanks, Edward On Feb 8, 2008 8:07 PM, Douglas Kojetin <douglas.kojetin@xxxxxxxxx> wrote:
Hi Edward, For the first run listed below (tensor, # runs to convergence): sphere, 4 prolate, 7 oblate, 9 ellipsoid, 9 For the second run listed below (tensor, # runs to convergence): sphere, 4 prolate, 5 oblate, 9 ellipsoid, 9 Doug On Feb 7, 2008, at 4:03 AM, Edward d'Auvergne wrote:Hi, Something strange is happening in your analysis. Unfortunately with the limited information in your post, I really cannot start to track it down. The fact that the local_tm and sphere runs are the same in both is a good sign. Do you have the number of iterations required for the convergence of each tensor? The full_analysis.py script should be insensitive to the input structure, but there is one point at which this might not be exactly true. The grid search for the initial tensor parameter values prior to minimisation is dependent. You can think of the grid search as a cage which stays fixed in space while the molecule spins around inside it. But the subsequent Newton optimisation should easily recover from the small differences of the increments between grid points. That might be a place to start looking though. The only point in relax where a random number generator is utilised is in the Monte Carlo simulations. It might also be worth looking at the Dr value in the ellipsoid optimisations. If this value is close to zero, then the results from the prolate and ellipsoid diffusion tensors may actually be almost the same. In which case, the differences don't matter. For understanding molecule motions, the model itself is of no interest - this is, of course, a simple exercise in the field of mathematical modelling (a mathematics library will show you the extent of this field). Note that it is what the model says about the dynamics which is of interest, not the details of the model itself. So two completely different models may actually say the same thing, but maybe with small statistically insignificant difference. That being said, this problem should not occur. But as I cannot do anything with the limited information, you may need to hunt down where the problem lies yourself (whether in relax, the operation of relax, the use of the quadric_diffusion program, or elsewhere). Regards, Edward On Jan 30, 2008 3:24 PM, Douglas Kojetin <douglas.kojetin@xxxxxxxxx> wrote:Hi Edward, As a followup to this, I performed two relax runs using six datasets (r1, r2 and noe at two fields) with two identical structures, but one had been rotated/translated using the quadric_diffusion program provided by the Palmer lab. For one structure, a prolate tensor is chosen, whereas an elliposid tensor is chosen for the rotated/ translated structure: ## ORIGINAL PDB Run Chi2 Criterion local_tm 102.67810 870.67810 sphere 177.96407 807.96407 prolate 152.70721 796.70721 oblate 178.61058 810.61058 ellipsoid 155.78475 801.78475 The model from the run 'prolate' has been selected. ## ROTATED/TRANSLATED PDB Run Chi2 Criterion local_tm 102.67810 870.67810 sphere 177.96407 807.96407 prolate 175.13432 803.13432 oblate 178.61979 810.61979 ellipsoid 155.82168 801.82168 The model from the run 'ellipsoid' has been selected. There are no differences in the models selected for two of the three structure-dependent runs (oblate and ellipsoid tensor runs) , but there are a handful of differences in the models selected for the prolate tensor runs. Is the full_analysis protocol sensitive to the orientation of the input structure, or could this be a result of different runs using something equivalent to different random number seeds? Doug On Jan 10, 2008, at 2:36 PM, Edward d'Auvergne wrote:Yes, with 4 data sets you could remove tm6 to tm8. You would also need to remove m8. But in this situation, you will be significantly biasing the initial position (the starting universe will be further away from that of the universal solution). I don't know how well this new protocol will perform 4 data sets, i.e. this is untested, but I would be highly reluctant to trust it. The relaxation data type and field strength will be very important. I would even be wary using 5 data sets, especially if the missing data set is the higher-field NOE. So I would never recommend using 4 data sets. Regards, Edward On Jan 10, 2008 8:12 PM, Douglas Kojetin <douglas.kojetin@xxxxxxxxx> wrote:Hi Edward, Thanks for the response. So, with 5 relaxation data sets, only tm8 should be removed -- no need to remove m8 as well? Also, if only 4 relaxation data sets were available, could {tm6-8 and m8} be removed to use the full_analysis.py protocol? Thanks, Doug On Jan 10, 2008, at 1:31 PM, Edward d'Auvergne wrote:Hi, If you have 5 relaxation data sets, you can use the full_analysis.py script but you will need to remove model tm8. This is the only model with 6 parameters and doing the analysis without it might just work (the other tm0 to tm9 models may compensate adequately). I've looked at the script and it seems fine. I think the issue is that the model-free problem is not simply an optimisation issue. It is the simultaneous combination of global optimisation (mathematics) with model selection (statistics). You are not searching for the global minimum in one space, as in a normal optimisation problem, but for the global minimum across and enormous number of spaces simultaneously. I formulated the totality of this problem using set theory here http://www.rsc.org/Publishing/Journals/MB/article.asp? doi=b702202f or in my PhD thesis at http://eprints.infodiv.unimelb.edu.au/archive/00002799/. In your script, the CONV_LOOP flag allows you to automatically loop over many global optimisations. Each iteration of the loop is the mathematical optimisation part. But the entire loop itself allows for the sliding between these different spaces. Note that this is a very, very complex problem involving huge numbers spaces or universes, each of which consists of a large number of dimensions. There was a mistake in my Molecular BioSystems paper in that the number of spaces is really equal to n*m^l where n is the number of diffusion models, m is the number of model-free models (10 if you use m0 to m9), and l is the number of spin systems. So if you have 200 residues, the number of spaces is on the order of 10 to the power of 200. The number of dimensions for this system is on the order of 10^2 to 10^3. So the problem is to find the 'best' minimum in 10^200 spaces, each consisting of 10^2 to 10^3 dimensions (the universal solution or the solution in the universal set). The problem is just a little more complex than most people think!!! So, my opinion of the problem is that the starting position of one of the 2 solutions is not good. In one (or maybe both) you are stuck in the wrong universe (out of billions of billions of billions of billions....). And you can't slide out of that universe using the looping procedure in your script. That's why I designed the new model-free analysis protocol used by the full_analysis.py script (http://www.springerlink.com/content/u170k174t805r344/? p=23cf5337c42e457abe3e5a1aeb38c520&pi=3 or the thesis again). The aim of this new protocol is so that you start in a universe much closer to the one with the universal solution that you can ever get with the initial diffusion tensor estimate. Then you can easily slide, in less than 20 iterations, to the universal solution using the looping procedure. For a published example of this type of failure, see the section titled "Failure of the diffusion seeded paradigm" in the previous link to the "Optimisation of NMR dynamic models II" paper. Does this description make sense? Does it answer all your questions? Regards, Edward On Jan 10, 2008 5:49 PM, Douglas Kojetin <douglas.kojetin@xxxxxxxxx> wrote:Hi All, I am working with five relaxation data sets (r1, r2 and noe at 400 MHz; r1 and r2 and 600 MHz), and therefore cannot use the full_analysis.py protocol. I have obtained estimates for tm, Dratio, theta and phi using Art Palmer's quadric_diffusion program. I modified the full_analysis.py protocol to optimize a prolate tensor using these estimates (attached file: mod.py). I have performed the optimization of the prolate tensor using either (1) my original structure or (2) the same structure rotated and translated by the quadric_diffusion program. It seems that relax does not converge to a single global optimum, as different values of tm, Da, theta and phi are reported. Using my original structure: #tm = 6.00721299718e-09 #Da = 14256303.3975 #theta = 11.127323614211441 #phi = 62.250251959733312 Using the rotated/translated structure by the quadric_diffusion program: #tm = 5.84350638161e-09 #Da = 11626835.475 #theta = 8.4006873071400197 #phi = 113.6068898953142 The only difference between the two calculations is the orientation of the input PDB structure file. For another set of five rates (different protein), there is a >0.3 ns difference in the converged tm values. Is my modified protocol (in mod.py) setup properly? Or is this a more complex issue in the global optimization? In previous attempts, I've also noticed that separate runs with differences in the estimates for Dratio, theta and phi also converge to different optimized diffusion tensor variables. Doug _______________________________________________ relax (http://nmr-relax.com) This is the relax-users mailing list relax-users@xxxxxxx To unsubscribe from this list, get a password reminder, or change your subscription options, visit the list information page at https://mail.gna.org/listinfo/relax-users