mailWeird performance of grid search and order of spectrum.error_analysis


Others Months | Index by Date | Thread Index
>>   [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Header


Content

Posted by Troels Emtekær Linnet on April 21, 2014 - 21:24:
Dear Edward.

It seems I am having trouble using the: grid_search(lower=None,
upper=None, inc=GRID_INC, constraints=True, verbosity=1)
and finding that the order of spectrum.error_analysis is important.

I am trying to re-run an analysis of the dataset published in:

Kaare Teilum, Melanie H. Smith, Eike Schulz, Lea C. Christensen, Gleb
Solomentseva, Mikael Oliveberg, and Mikael Akkea 2009
"SOD-WT" CPMG data to the CR72 dispersion model at 25 degrees.
http://dx.doi.org/10.1073/pnas.0907387106

This is CPMG data with a fixed relaxation time period at 500 and 600MHz.
Global fitted to 64 residues.
kex former fitted to 2200 /s with pA=0.993 (Fig. S2 )

I did my initial analysis with a grid search for 11 increments on each 
residues.
Then did a minimization.

Then defined the 64 residues for global fit, read in the results from
above, and minimized.
The end results was weird.
kex = 6.62 and pA = 0.67, and the calculations was extremely slow.

I guess that the very long computation time is quite much related to a
bad-start from the grid_search.

I then looked into the single fitted residues.
These results showed, that either the pA was close to 0.5 or 0.99.

gnuplot> set yrange [0.4:1.1]
gnuplot> set term dumb
Terminal type set to 'dumb'
Options are 'feed  size 79, 24'
gnuplot> plot "compare_128_FT_R2eff_CR72" using 2:6


  1.1 ++-------+-------+--------+--------+-------+--------+-------+-------++
      +        +       +      "compare_128_FT_R2eff_CR72" using 2:6   A    +
      |                                                                    |
    1 ++AAA A A    AAAAAA AAAAAAA A AAAAAA  AA A AA  A A    AAA AAA    AA ++
      |A        A                                A                         |
  0.9 ++       A                                                          ++
      |        A                                                           |
      |                                                                    |
  0.8 ++              A          A   A                                    ++
      |      A                                                             |
      |                                                                    |
  0.7 ++          A             A             A                 A         ++
      | A        A     A   A    A  A                                    A  |
      |    A                                                               |
  0.6 ++                  A                            A           A      ++
      |                                                                    |
  0.5 +A AAAAA AAA A A  AA  AAAAAAAAAA AAA AA AAA AAAAA AAAA AAA  AAAAAA  ++
      |                                                                    |
      +        +       +        +        +       +        +       +        +
  0.4 ++-------+-------+--------+--------+-------+--------+-------+-------++
      0        20      40       60       80     100      120     140      160


That made me believe, that the grid search is not performing so well.

I have made a system test to investigate:
relax -s 
Relax_disp.test_kteilum_mhsmith_eschulz_lcchristensen_gsolomentsev_moliveberg_makke_sod1wt_t25_to_cr72
-d

I notice that it become horrible "expensive" to increase the number of
grid search:


GRID Nodes = X^5;  elapsed time = 3.84E-4*X^4.94
3 243 0.0873595167
5 3125 1.0895396658
7 16807 5.7426922356
9 59049 19.8741800961
11 161051 53.5563645469
13 371293 122.239449318
15 759375 247.8689026545
17 1419857 459.9905354953
19 2476099 796.8452957302
21 4084101 1306.4552557758

For 21 grid points, it will take 21 minutes per spin.

But the problem is, that I don't see kex and pA moves?
Doing up to 13 GRID_INC, I see that neither kex or pA moves, but only
dw, r2500, r2600.

########################## GRID INC 3 ##########################
GRID, kex, pA, dw, r2500, r2600, mol, resi, resn, spin_id, elapsed
(3, 1.0, 0.5, 5.0, 20.5, 20.5, None, 10, 'G', ':10@N', 0.089359045028686523)
########################## GRID INC 5 ##########################
GRID, kex, pA, dw, r2500, r2600, mol, resi, resn, spin_id, elapsed
(5, 1.0, 0.5, 5.0, 20.5, 20.5, None, 10, 'G', ':10@N', 1.060028076171875)
########################## GRID INC 7 ##########################
GRID, kex, pA, dw, r2500, r2600, mol, resi, resn, spin_id, elapsed
(7, 1.0, 0.5, 3.3333333333333335, 20.5, 20.5, None, 10, 'G', ':10@N',
5.6306891441345215)
########################## GRID INC 9 ##########################
GRID, kex, pA, dw, r2500, r2600, mol, resi, resn, spin_id, elapsed
(9, 1.0, 0.5, 3.75, 20.5, 20.5, None, 10, 'G', ':10@N', 19.880398035049438)
########################## GRID INC 11 ##########################
GRID, kex, pA, dw, r2500, r2600, mol, resi, resn, spin_id, elapsed
(11, 1.0, 0.5, 4.0, 20.5, 24.399999999999999, None, 10, 'G', ':10@N',
54.264302015304565)
########################## GRID INC 13 ##########################
GRID, kex, pA, dw, r2500, r2600, mol, resi, resn, spin_id, elapsed
(13, 1.0, 0.5, 4.166666666666667, 20.5, 23.75, None, 10, 'G', ':10@N',
122.64782810211182)

If I then set kex to 2200 before the Grid search, I see that pA is moving.

########################## GRID INC 3 ##########################
GRID, kex, pA, dw, r2500, r2600, mol, resi, resn, spin_id, elapsed
(3, 2200.0, 0.5, 0.0, 20.5, 20.5, None, 10, 'G', ':10@N', 
0.043489933013916016)
########################## GRID INC 5 ##########################
GRID, kex, pA, dw, r2500, r2600, mol, resi, resn, spin_id, elapsed
(5, 2200.0, 0.5, 0.0, 20.5, 20.5, None, 10, 'G', ':10@N', 0.29647397994995117)
########################## GRID INC 7 ##########################
GRID, kex, pA, dw, r2500, r2600, mol, resi, resn, spin_id, elapsed
(7, 2200.0, 0.5, 0.0, 20.5, 20.5, None, 10, 'G', ':10@N', 1.045961856842041)
########################## GRID INC 9 ##########################
GRID, kex, pA, dw, r2500, r2600, mol, resi, resn, spin_id, elapsed
(9, 2200.0, 0.5, 0.0, 20.5, 20.5, None, 10, 'G', ':10@N', 2.841001033782959)
########################## GRID INC 11 ##########################
GRID, kex, pA, dw, r2500, r2600, mol, resi, resn, spin_id, elapsed
(11, 2200.0, 0.80000000000000004, 1.0, 16.600000000000001,
16.600000000000001, None, 10, 'G', ':10@N', 6.2603960037231445)
########################## GRID INC 13 ##########################
GRID, kex, pA, dw, r2500, r2600, mol, resi, resn, spin_id, elapsed
(13, 2200.0, 0.75, 0.83333333333333337, 17.25, 20.5, None, 10, 'G',
':10@N', 11.912189960479736)
########################## GRID INC 19 ##########################
GRID, kex, pA, dw, r2500, r2600, mol, resi, resn, spin_id, elapsed
(19, 2200.0, 0.91666666666666674, 1.1111111111111112,
18.333333333333336, 20.500000000000004, None, 10, 'G', ':10@N',
56.463220119476318)
########################## GRID INC 21 ##########################
GRID, kex, pA, dw, r2500, r2600, mol, resi, resn, spin_id, elapsed
(21, 2200.0, 0.90000000000000002, 1.0, 18.550000000000001, 20.5, None,
10, 'G', ':10@N', 83.805304050445557)

Is this to be expected?


Another thing that worries me, is the order how you perform the error 
analysis:
If I set:
********************************************
relax> spectrum.replicated(spectrum_ids=['Z_A1', 'Z_A15'])
relax> spectrum.replicated(spectrum_ids=['Z_B1', 'Z_B18'])
relax> spectrum.error_analysis(subset=['Z_A0', 'Z_A1', 'Z_A2', 'Z_A3',
'Z_A4', 'Z_A5', 'Z_A6', 'Z_A7', 'Z_A8', 'Z_A9', 'Z_A10', 'Z_A11',
'Z_A12', 'Z_A13', 'Z_A14', 'Z_A15', 'Z_A16'])
Intensity measure:  Peak heights.
Replicated spectra:  Yes.
All spectra replicated:  No.

Replicated spectra:  ['Z_A1', 'Z_A15']
Standard deviation:  1900.52054868

Variance averaging over all spectra.
Standard deviation for all spins:  1900.5205486798784

relax> spectrum.error_analysis(subset=['Z_B0', 'Z_B1', 'Z_B2', 'Z_B3',
'Z_B4', 'Z_B5', 'Z_B6', 'Z_B7', 'Z_B8', 'Z_B9', 'Z_B10', 'Z_B11',
'Z_B12', 'Z_B13', 'Z_B14', 'Z_B15', 'Z_B16', 'Z_B17', 'Z_B18'])
Intensity measure:  Peak heights.
Replicated spectra:  Yes.
All spectra replicated:  No.

Replicated spectra:  ['Z_B1', 'Z_B18']
Standard deviation:  5629.56857527

Variance averaging over all spectra.
Standard deviation for all spins:  2562.7669744222535
*************************
relax> spectrum.replicated(spectrum_ids=['Z_A1', 'Z_A15'])
relax> spectrum.replicated(spectrum_ids=['Z_B1', 'Z_B18'])

relax> spectrum.error_analysis(subset=['Z_B0', 'Z_B1', 'Z_B2', 'Z_B3',
'Z_B4', 'Z_B5', 'Z_B6', 'Z_B7', 'Z_B8', 'Z_B9', 'Z_B10', 'Z_B11',
'Z_B12', 'Z_B13', 'Z_B14', 'Z_B15', 'Z_B16', 'Z_B17', 'Z_B18'])
Intensity measure:  Peak heights.
Replicated spectra:  Yes.
All spectra replicated:  No.

Replicated spectra:  ['Z_B1', 'Z_B18']
Standard deviation:  5629.56857527

Variance averaging over all spectra.
Standard deviation for all spins:  5629.5685752716654

relax> spectrum.error_analysis(subset=['Z_A0', 'Z_A1', 'Z_A2', 'Z_A3',
'Z_A4', 'Z_A5', 'Z_A6', 'Z_A7', 'Z_A8', 'Z_A9', 'Z_A10', 'Z_A11',
'Z_A12', 'Z_A13', 'Z_A14', 'Z_A15', 'Z_A16'])
Intensity measure:  Peak heights.
Replicated spectra:  Yes.
All spectra replicated:  No.

Replicated spectra:  ['Z_A1', 'Z_A15']
Standard deviation:  1900.52054868

Variance averaging over all spectra.
Standard deviation for all spins:  5386.8126508475143
*************************
The difference is:
Standard deviation for all spins:  2562.7669744222535
or
Standard deviation for all spins:  5386.8126508475143

Best
Troels



Related Messages


Powered by MHonArc, Updated Tue Apr 22 14:00:17 2014