Re: Weird performance of grid search and order of spectrum.error_analysis -- April 22, 2014

Hi Troels,

This second thread at
http://thread.gmane.org/gmane.science.nmr.relax.devel/5302 will cover
the grid search.

It seems I am having trouble using the: grid_search(lower=None,
upper=None, inc=GRID_INC, constraints=True, verbosity=1)
and finding that the order of spectrum.error_analysis is important.

I am trying to re-run an analysis of the dataset published in:

Kaare Teilum, Melanie H. Smith, Eike Schulz, Lea C. Christensen, Gleb
Solomentseva, Mikael Oliveberg, and Mikael Akkea 2009
"SOD-WT" CPMG data to the CR72 dispersion model at 25 degrees.
http://dx.doi.org/10.1073/pnas.0907387106

This is CPMG data with a fixed relaxation time period at 500 and 600MHz.
Global fitted to 64 residues.
kex former fitted to 2200 /s with pA=0.993 (Fig. S2 )

I did my initial analysis with a grid search for 11 increments on each 
residues.
Then did a minimization.

Then defined the 64 residues for global fit, read in the results from
above, and minimized.
The end results was weird.
kex = 6.62 and pA = 0.67, and the calculations was extremely slow.

I guess that the very long computation time is quite much related to a
bad-start from the grid_search.

I then looked into the single fitted residues.
These results showed, that either the pA was close to 0.5 or 0.99.


A pA value of 0.5 or 0.99 mean different things.  At 0.5, that shows
that optimisation was unable to find the minimum.  There is the
constraint that pA > 0.5 in relax to simplify the optimisation space.
By convention the first state has the highest probability and for most
models (which don't say that pA must be > than pB), having a pA value
less than 0.5 is the same as swapping states A and B.  Anyway, a value
of 0.5 is not good!

The value of >0.99 means that there is no statistically significant
dispersion found.  This could either be true, or that optimisation has
again failed to find the minimum.

gnuplot> set yrange [0.4:1.1]
gnuplot> set term dumb
Terminal type set to 'dumb'
Options are 'feed  size 79, 24'
gnuplot> plot "compare_128_FT_R2eff_CR72" using 2:6


  1.1 ++-------+-------+--------+--------+-------+--------+-------+-------++
      +        +       +      "compare_128_FT_R2eff_CR72" using 2:6   A    +
      |                                                                    |
    1 ++AAA A A    AAAAAA AAAAAAA A AAAAAA  AA A AA  A A    AAA AAA    AA ++
      |A        A                                A                         |
  0.9 ++       A                                                          ++
      |        A                                                           |
      |                                                                    |
  0.8 ++              A          A   A                                    ++
      |      A                                                             |
      |                                                                    |
  0.7 ++          A             A             A                 A         ++
      | A        A     A   A    A  A                                    A  |
      |    A                                                               |
  0.6 ++                  A                            A           A      ++
      |                                                                    |
  0.5 +A AAAAA AAA A A  AA  AAAAAAAAAA AAA AA AAA AAAAA AAAA AAA  AAAAAA  ++
      |                                                                    |
      +        +       +        +        +       +        +       +        +
  0.4 ++-------+-------+--------+--------+-------+--------+-------+-------++
      0        20      40       60       80     100      120     140      
160


That made me believe, that the grid search is not performing so well.


That looks rather disappointing!

I have made a system test to investigate:
relax -s 
Relax_disp.test_kteilum_mhsmith_eschulz_lcchristensen_gsolomentsev_moliveberg_makke_sod1wt_t25_to_cr72
-d

I notice that it become horrible "expensive" to increase the number of
grid search:


GRID Nodes = X^5;  elapsed time = 3.84E-4*X^4.94
3 243 0.0873595167
5 3125 1.0895396658
7 16807 5.7426922356
9 59049 19.8741800961
11 161051 53.5563645469
13 371293 122.239449318
15 759375 247.8689026545
17 1419857 459.9905354953
19 2476099 796.8452957302
21 4084101 1306.4552557758

For 21 grid points, it will take 21 minutes per spin.


This is to be expected for a grid search - for the CR72 model at 2
fields, you have 5 parameters.

But the problem is, that I don't see kex and pA moves?
Doing up to 13 GRID_INC, I see that neither kex or pA moves, but only
dw, r2500, r2600.

########################## GRID INC 3 ##########################
GRID, kex, pA, dw, r2500, r2600, mol, resi, resn, spin_id, elapsed
(3, 1.0, 0.5, 5.0, 20.5, 20.5, None, 10, 'G', ':10@N', 0.089359045028686523)
########################## GRID INC 5 ##########################
GRID, kex, pA, dw, r2500, r2600, mol, resi, resn, spin_id, elapsed
(5, 1.0, 0.5, 5.0, 20.5, 20.5, None, 10, 'G', ':10@N', 1.060028076171875)
########################## GRID INC 7 ##########################
GRID, kex, pA, dw, r2500, r2600, mol, resi, resn, spin_id, elapsed
(7, 1.0, 0.5, 3.3333333333333335, 20.5, 20.5, None, 10, 'G', ':10@N',
5.6306891441345215)
########################## GRID INC 9 ##########################
GRID, kex, pA, dw, r2500, r2600, mol, resi, resn, spin_id, elapsed
(9, 1.0, 0.5, 3.75, 20.5, 20.5, None, 10, 'G', ':10@N', 19.880398035049438)
########################## GRID INC 11 ##########################
GRID, kex, pA, dw, r2500, r2600, mol, resi, resn, spin_id, elapsed
(11, 1.0, 0.5, 4.0, 20.5, 24.399999999999999, None, 10, 'G', ':10@N',
54.264302015304565)
########################## GRID INC 13 ##########################
GRID, kex, pA, dw, r2500, r2600, mol, resi, resn, spin_id, elapsed
(13, 1.0, 0.5, 4.166666666666667, 20.5, 23.75, None, 10, 'G', ':10@N',
122.64782810211182)

If I then set kex to 2200 before the Grid search, I see that pA is moving.

########################## GRID INC 3 ##########################
GRID, kex, pA, dw, r2500, r2600, mol, resi, resn, spin_id, elapsed
(3, 2200.0, 0.5, 0.0, 20.5, 20.5, None, 10, 'G', ':10@N', 
0.043489933013916016)
########################## GRID INC 5 ##########################
GRID, kex, pA, dw, r2500, r2600, mol, resi, resn, spin_id, elapsed
(5, 2200.0, 0.5, 0.0, 20.5, 20.5, None, 10, 'G', ':10@N', 
0.29647397994995117)
########################## GRID INC 7 ##########################
GRID, kex, pA, dw, r2500, r2600, mol, resi, resn, spin_id, elapsed
(7, 2200.0, 0.5, 0.0, 20.5, 20.5, None, 10, 'G', ':10@N', 1.045961856842041)
########################## GRID INC 9 ##########################
GRID, kex, pA, dw, r2500, r2600, mol, resi, resn, spin_id, elapsed
(9, 2200.0, 0.5, 0.0, 20.5, 20.5, None, 10, 'G', ':10@N', 2.841001033782959)
########################## GRID INC 11 ##########################
GRID, kex, pA, dw, r2500, r2600, mol, resi, resn, spin_id, elapsed
(11, 2200.0, 0.80000000000000004, 1.0, 16.600000000000001,
16.600000000000001, None, 10, 'G', ':10@N', 6.2603960037231445)
########################## GRID INC 13 ##########################
GRID, kex, pA, dw, r2500, r2600, mol, resi, resn, spin_id, elapsed
(13, 2200.0, 0.75, 0.83333333333333337, 17.25, 20.5, None, 10, 'G',
':10@N', 11.912189960479736)
########################## GRID INC 19 ##########################
GRID, kex, pA, dw, r2500, r2600, mol, resi, resn, spin_id, elapsed
(19, 2200.0, 0.91666666666666674, 1.1111111111111112,
18.333333333333336, 20.500000000000004, None, 10, 'G', ':10@N',
56.463220119476318)
########################## GRID INC 21 ##########################
GRID, kex, pA, dw, r2500, r2600, mol, resi, resn, spin_id, elapsed
(21, 2200.0, 0.90000000000000002, 1.0, 18.550000000000001, 20.5, None,
10, 'G', ':10@N', 83.805304050445557)

Is this to be expected?


If you have a look at the grid_search_setup() function in the
specific_analyses.relax_disp.optimisation module, you may notice some
inefficiencies.  I wrote this code very quickly to get the relaxation
dispersion analysis up and running as fast as possible.  The main
problem were the random guesses I made at the time, without much
thought, as to what the lower and upper grid search values should be.
You can see for kex that the lower value is 1 and the upper value is
1e5.  You should play around with these defaults and see how that
changes the grid search performance.  I would probably now use 0 to
1e4 for kex instead, as the higher kex values should be rare and easy
to optimise from the lower values anyway.  It could be that because
most of the kex grid points are at very high values, that most of the
grid search is not doing anything.

Anyway, see what happens when you play with this, and feel free to
change the default values.  Oh, just so you know why I chose 1 to 1e5
for kex - I was originally planning on not using the current uniform
grid search but a custom one where the kex dimension is on the log
scale.  This can be done with minfx (see
http://home.gna.org/minfx/minfx.grid-module.html#grid_point_array), as
is used in the frame order analysis and model-free analysis to
parallelise the diffusion tensor grid search.  But this was never
implemented - it was partly implemented but then I deleted it as it
was taking too much time.

Regards,

Edward

Re: Weird performance of grid search and order of spectrum.error_analysis

Header

Content

Related Messages