mailRe: Weird performance of grid search and order of spectrum.error_analysis


Others Months | Index by Date | Thread Index
>>   [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Header


Content

Posted by Edward d'Auvergne on April 22, 2014 - 09:25:
Hi Troels,

There are a number of issues here, so I'll address them in a couple of
different emails.  That will make the threads easier to follow.
Firstly for the system test, I would suggest to shorten the name.
Instead of typing:

$ relax -s 
Relax_disp.test_kteilum_mhsmith_eschulz_lcchristensen_gsolomentsev_moliveberg_makke_sod1wt_t25_to_cr72

It would be easier to type:

$ relax -s Relax_disp.test_sod1wt_t25_to_cr72

There's no need to list all the authors in the name of the test.  You
have them listed in the docstring of the system test (though you
should move them into the main text rather than intro line to keep
that line under 100 characters for the API documentation
http://www.nmr-relax.com/api/3.1/) and included the DOI link as well.
Also for the system test directory
'test_suite/shared_data/dispersion/KTeilum_MHsmith_ESchulz_LCchristensen_GSolomentsev_MOliveberg_MAkke_2009/',
this could be renamed to maybe
'test_suite/shared_data/dispersion/sod1wt_t25/' or some other more
appropriate name.  Though very unlikely, someone may use an operating
system where such long directory names are not supported.  You could
also maybe put a README file into this directory giving the reference
and DOI link?

Cheers,

Edward



On 21 April 2014 21:23, Troels Emtekær Linnet <tlinnet@xxxxxxxxxxxxx> wrote:
Dear Edward.

It seems I am having trouble using the: grid_search(lower=None,
upper=None, inc=GRID_INC, constraints=True, verbosity=1)
and finding that the order of spectrum.error_analysis is important.

I am trying to re-run an analysis of the dataset published in:

Kaare Teilum, Melanie H. Smith, Eike Schulz, Lea C. Christensen, Gleb
Solomentseva, Mikael Oliveberg, and Mikael Akkea 2009
"SOD-WT" CPMG data to the CR72 dispersion model at 25 degrees.
http://dx.doi.org/10.1073/pnas.0907387106

This is CPMG data with a fixed relaxation time period at 500 and 600MHz.
Global fitted to 64 residues.
kex former fitted to 2200 /s with pA=0.993 (Fig. S2 )

I did my initial analysis with a grid search for 11 increments on each 
residues.
Then did a minimization.

Then defined the 64 residues for global fit, read in the results from
above, and minimized.
The end results was weird.
kex = 6.62 and pA = 0.67, and the calculations was extremely slow.

I guess that the very long computation time is quite much related to a
bad-start from the grid_search.

I then looked into the single fitted residues.
These results showed, that either the pA was close to 0.5 or 0.99.

gnuplot> set yrange [0.4:1.1]
gnuplot> set term dumb
Terminal type set to 'dumb'
Options are 'feed  size 79, 24'
gnuplot> plot "compare_128_FT_R2eff_CR72" using 2:6


  1.1 ++-------+-------+--------+--------+-------+--------+-------+-------++
      +        +       +      "compare_128_FT_R2eff_CR72" using 2:6   A    +
      |                                                                    |
    1 ++AAA A A    AAAAAA AAAAAAA A AAAAAA  AA A AA  A A    AAA AAA    AA ++
      |A        A                                A                         |
  0.9 ++       A                                                          ++
      |        A                                                           |
      |                                                                    |
  0.8 ++              A          A   A                                    ++
      |      A                                                             |
      |                                                                    |
  0.7 ++          A             A             A                 A         ++
      | A        A     A   A    A  A                                    A  |
      |    A                                                               |
  0.6 ++                  A                            A           A      ++
      |                                                                    |
  0.5 +A AAAAA AAA A A  AA  AAAAAAAAAA AAA AA AAA AAAAA AAAA AAA  AAAAAA  ++
      |                                                                    |
      +        +       +        +        +       +        +       +        +
  0.4 ++-------+-------+--------+--------+-------+--------+-------+-------++
      0        20      40       60       80     100      120     140      
160


That made me believe, that the grid search is not performing so well.

I have made a system test to investigate:
relax -s 
Relax_disp.test_kteilum_mhsmith_eschulz_lcchristensen_gsolomentsev_moliveberg_makke_sod1wt_t25_to_cr72
-d

I notice that it become horrible "expensive" to increase the number of
grid search:


GRID Nodes = X^5;  elapsed time = 3.84E-4*X^4.94
3 243 0.0873595167
5 3125 1.0895396658
7 16807 5.7426922356
9 59049 19.8741800961
11 161051 53.5563645469
13 371293 122.239449318
15 759375 247.8689026545
17 1419857 459.9905354953
19 2476099 796.8452957302
21 4084101 1306.4552557758

For 21 grid points, it will take 21 minutes per spin.

But the problem is, that I don't see kex and pA moves?
Doing up to 13 GRID_INC, I see that neither kex or pA moves, but only
dw, r2500, r2600.

########################## GRID INC 3 ##########################
GRID, kex, pA, dw, r2500, r2600, mol, resi, resn, spin_id, elapsed
(3, 1.0, 0.5, 5.0, 20.5, 20.5, None, 10, 'G', ':10@N', 0.089359045028686523)
########################## GRID INC 5 ##########################
GRID, kex, pA, dw, r2500, r2600, mol, resi, resn, spin_id, elapsed
(5, 1.0, 0.5, 5.0, 20.5, 20.5, None, 10, 'G', ':10@N', 1.060028076171875)
########################## GRID INC 7 ##########################
GRID, kex, pA, dw, r2500, r2600, mol, resi, resn, spin_id, elapsed
(7, 1.0, 0.5, 3.3333333333333335, 20.5, 20.5, None, 10, 'G', ':10@N',
5.6306891441345215)
########################## GRID INC 9 ##########################
GRID, kex, pA, dw, r2500, r2600, mol, resi, resn, spin_id, elapsed
(9, 1.0, 0.5, 3.75, 20.5, 20.5, None, 10, 'G', ':10@N', 19.880398035049438)
########################## GRID INC 11 ##########################
GRID, kex, pA, dw, r2500, r2600, mol, resi, resn, spin_id, elapsed
(11, 1.0, 0.5, 4.0, 20.5, 24.399999999999999, None, 10, 'G', ':10@N',
54.264302015304565)
########################## GRID INC 13 ##########################
GRID, kex, pA, dw, r2500, r2600, mol, resi, resn, spin_id, elapsed
(13, 1.0, 0.5, 4.166666666666667, 20.5, 23.75, None, 10, 'G', ':10@N',
122.64782810211182)

If I then set kex to 2200 before the Grid search, I see that pA is moving.

########################## GRID INC 3 ##########################
GRID, kex, pA, dw, r2500, r2600, mol, resi, resn, spin_id, elapsed
(3, 2200.0, 0.5, 0.0, 20.5, 20.5, None, 10, 'G', ':10@N', 
0.043489933013916016)
########################## GRID INC 5 ##########################
GRID, kex, pA, dw, r2500, r2600, mol, resi, resn, spin_id, elapsed
(5, 2200.0, 0.5, 0.0, 20.5, 20.5, None, 10, 'G', ':10@N', 
0.29647397994995117)
########################## GRID INC 7 ##########################
GRID, kex, pA, dw, r2500, r2600, mol, resi, resn, spin_id, elapsed
(7, 2200.0, 0.5, 0.0, 20.5, 20.5, None, 10, 'G', ':10@N', 1.045961856842041)
########################## GRID INC 9 ##########################
GRID, kex, pA, dw, r2500, r2600, mol, resi, resn, spin_id, elapsed
(9, 2200.0, 0.5, 0.0, 20.5, 20.5, None, 10, 'G', ':10@N', 2.841001033782959)
########################## GRID INC 11 ##########################
GRID, kex, pA, dw, r2500, r2600, mol, resi, resn, spin_id, elapsed
(11, 2200.0, 0.80000000000000004, 1.0, 16.600000000000001,
16.600000000000001, None, 10, 'G', ':10@N', 6.2603960037231445)
########################## GRID INC 13 ##########################
GRID, kex, pA, dw, r2500, r2600, mol, resi, resn, spin_id, elapsed
(13, 2200.0, 0.75, 0.83333333333333337, 17.25, 20.5, None, 10, 'G',
':10@N', 11.912189960479736)
########################## GRID INC 19 ##########################
GRID, kex, pA, dw, r2500, r2600, mol, resi, resn, spin_id, elapsed
(19, 2200.0, 0.91666666666666674, 1.1111111111111112,
18.333333333333336, 20.500000000000004, None, 10, 'G', ':10@N',
56.463220119476318)
########################## GRID INC 21 ##########################
GRID, kex, pA, dw, r2500, r2600, mol, resi, resn, spin_id, elapsed
(21, 2200.0, 0.90000000000000002, 1.0, 18.550000000000001, 20.5, None,
10, 'G', ':10@N', 83.805304050445557)

Is this to be expected?


Another thing that worries me, is the order how you perform the error 
analysis:
If I set:
********************************************
relax> spectrum.replicated(spectrum_ids=['Z_A1', 'Z_A15'])
relax> spectrum.replicated(spectrum_ids=['Z_B1', 'Z_B18'])
relax> spectrum.error_analysis(subset=['Z_A0', 'Z_A1', 'Z_A2', 'Z_A3',
'Z_A4', 'Z_A5', 'Z_A6', 'Z_A7', 'Z_A8', 'Z_A9', 'Z_A10', 'Z_A11',
'Z_A12', 'Z_A13', 'Z_A14', 'Z_A15', 'Z_A16'])
Intensity measure:  Peak heights.
Replicated spectra:  Yes.
All spectra replicated:  No.

Replicated spectra:  ['Z_A1', 'Z_A15']
Standard deviation:  1900.52054868

Variance averaging over all spectra.
Standard deviation for all spins:  1900.5205486798784

relax> spectrum.error_analysis(subset=['Z_B0', 'Z_B1', 'Z_B2', 'Z_B3',
'Z_B4', 'Z_B5', 'Z_B6', 'Z_B7', 'Z_B8', 'Z_B9', 'Z_B10', 'Z_B11',
'Z_B12', 'Z_B13', 'Z_B14', 'Z_B15', 'Z_B16', 'Z_B17', 'Z_B18'])
Intensity measure:  Peak heights.
Replicated spectra:  Yes.
All spectra replicated:  No.

Replicated spectra:  ['Z_B1', 'Z_B18']
Standard deviation:  5629.56857527

Variance averaging over all spectra.
Standard deviation for all spins:  2562.7669744222535
*************************
relax> spectrum.replicated(spectrum_ids=['Z_A1', 'Z_A15'])
relax> spectrum.replicated(spectrum_ids=['Z_B1', 'Z_B18'])

relax> spectrum.error_analysis(subset=['Z_B0', 'Z_B1', 'Z_B2', 'Z_B3',
'Z_B4', 'Z_B5', 'Z_B6', 'Z_B7', 'Z_B8', 'Z_B9', 'Z_B10', 'Z_B11',
'Z_B12', 'Z_B13', 'Z_B14', 'Z_B15', 'Z_B16', 'Z_B17', 'Z_B18'])
Intensity measure:  Peak heights.
Replicated spectra:  Yes.
All spectra replicated:  No.

Replicated spectra:  ['Z_B1', 'Z_B18']
Standard deviation:  5629.56857527

Variance averaging over all spectra.
Standard deviation for all spins:  5629.5685752716654

relax> spectrum.error_analysis(subset=['Z_A0', 'Z_A1', 'Z_A2', 'Z_A3',
'Z_A4', 'Z_A5', 'Z_A6', 'Z_A7', 'Z_A8', 'Z_A9', 'Z_A10', 'Z_A11',
'Z_A12', 'Z_A13', 'Z_A14', 'Z_A15', 'Z_A16'])
Intensity measure:  Peak heights.
Replicated spectra:  Yes.
All spectra replicated:  No.

Replicated spectra:  ['Z_A1', 'Z_A15']
Standard deviation:  1900.52054868

Variance averaging over all spectra.
Standard deviation for all spins:  5386.8126508475143
*************************
The difference is:
Standard deviation for all spins:  2562.7669744222535
or
Standard deviation for all spins:  5386.8126508475143

Best
Troels

_______________________________________________
relax (http://www.nmr-relax.com)

This is the relax-devel mailing list
relax-devel@xxxxxxx

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-devel



Related Messages


Powered by MHonArc, Updated Tue Apr 22 10:00:15 2014