Re: The best way to handle a cluster analysis -- October 08, 2013

Hi,

The problem you see is that of failed models.  I'm guessing that
residues 4, 5, 6, 7, and 9 are reasonable and 3 and 8 are not.  Have
you looked closely at the data?  Is it of high quality?  There are two
things which can be done here.  The first is the most basic - perform
an analysis deselecting the spin systems with 'bad' data or that have
failed.  This is what would normally be done by a user.  The
model_eliminate user function performs this for the model-free and
other analyses, but no dispersion rules are set up for this.  I also
do not think that we can have automatic rules for determining this as
these high kex values might be reasonable in another molecular system.

The second option is to modify the parameter averaging in the
clustering.  This is in the
specific_analyses.relax_disp.parameters.copy() function.  You will see
here that the code averages the parameters.  Maybe a better measure
would be to use the median rather than mean.  This should handle
severe outliers caused by failed models much better.  The
numpy.median() function could be used for this.  Instead of summing
the parameters, lists could be created and then the list sent to
numpy.median().  For example initialise kex as [], use the append
method kex.append(spin_from.kex), and then store the value as
spin_to.kex = numpy.median(kex).  For example:

relax> import numpy
relax> kex = [1060772.86091342, 3.20195659408987, 2.15902319384803,
2.7190458578948, 2.37325031348072, 828438.809437978, 2.93391757645135]
relax> numpy.median(kex)
2.9339175764513499
relax>

What do you think?  Would this solve the problems you see?

Cheers,

Edward


On 8 October 2013 14:22, Troels Emtekær Linnet <tlinnet@xxxxxxxxx> wrote:

Hi Edward.

I would like to hear, if you could give some hints how you would
handle a clustering analysis situation the best way.

It is related to this entry, which I am modifying.
http://wiki.nmr-relax.com/Tutorial_for_Relaxation_dispersion_analysis_cpmg_fixed_time_recorded_on_varian_as_fid_interleaved#Execute_a_clustering_analysis

I would like to do the clustering analysis for the model: TSMFK01

When I do the clustering analysis, I am pointing to a previous result 
directory.
I am clustering those residues, which is not the model "No Rex".

relax will read the file TSMFK01/kAB.out file with the previous results.
I will take the averaged kAB values for the residues specified in the 
cluster.

-------------
# Parameter description:  The exchange rate from state A to state B.
#
# mol_name    res_num    res_name    spin_num    spin_name    value
               error
None          3          A           1           N
1060772.86091342    None
None          4          E           2           N
3.20195659408987    None
None          5          F           3           N
2.15902319384803    None
None          6          D           4           N
2.7190458578948    None
None          7          K           5           N
2.37325031348072    None
None          8          A           6           N
828438.809437978    None
None          9          A           7           N
2.93391757645135    None
...
--------------

The problem here is, that it seems that the fitting is going "crazy"
for some residues.
And the averaged starting value will be:
Averaged k_AB value: 83471.18......

The expected value would be somewhere: kAB = 2-10

What would be the best solution for this problem?
1) Modify the source code for the TSMFK01 model, so results are within
expected range.
2) Manually modify the TSMFK01/kAB.out and write k_AB=5
3) Skipping the pointing to a previous run directory, loop over the
spins and setting kAB=5 before doing a minimization?


-----------------
Output from an auto-analysis
-------------------------

-----------------------
- The 'TSMFK01' model -
-----------------------
relax> pipe.copy(pipe_from='base pipe', pipe_to='TSMFK01',
bundle_to='relax_disp')
relax> pipe.switch(pipe_name='TSMFK01')
relax> relax_disp.select_model(model='TSMFK01')
The Tollinger et al. (2001) 2-site very-slow exchange model, range of
microsecond to second time scale.

relax> value.copy(pipe_from='R2eff', pipe_to='TSMFK01', param='r2eff')
relax> pipe.create(pipe_name='pre', pipe_type='relax_disp', bundle=None)

relax> results.read(file='results',
dir='/net/tomat/home/tlinnet/kte/acbp/acbp_cpmg_disp_04MGuHCl_40C_041223_RELAX.fid/relax_reprocess/model_sel_analyt/TSMFK01')
Opening the file
'/net/tomat/home/tlinnet/kte/acbp/acbp_cpmg_disp_04MGuHCl_40C_041223_RELAX.fid/relax_reprocess/model_sel_analyt/TSMFK01/results.bz2'
for reading.

relax> relax_disp.parameter_copy(pipe_from='pre', pipe_to='TSMFK01')
Copying parameters for the spin block [':3@N', ':4@N', ':5@N', ':6@N',
':7@N', ':9@N', ':10@N', ':11@N', ':12@N', ':13@N', ':14@N', ':15@N',
':16@N', ':17@N', ':18@N', ':20@N', ':21@N', ':22@N', ':23@N',
':24@N', ':25@N', ':26@N', ':27@N', ':28@N', ':29@N', ':30@N',
':31@N', ':32@N', ':33@N', ':34@N', ':35@N', ':36@N', ':37@N',
':38@N', ':39@N', ':40@N', ':41@N', ':43@N', ':45@N', ':46@N',
':47@N', ':48@N', ':49@N', ':50@N', ':52@N', ':53@N', ':54@N',
':56@N', ':57@N', ':58@N', ':59@N', ':60@N', ':61@N', ':62@N',
':63@N', ':64@N', ':65@N', ':66@N', ':67@N', ':68@N', ':69@N',
':70@N', ':71@N', ':72@N', ':73@N', ':74@N', ':75@N', ':77@N',
':78@N', ':80@N', ':81@N', ':82@N', ':83@N', ':84@N', ':85@N',
':86@N']
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Averaged k_AB value: 83471.182642624335131

relax> pipe.switch(pipe_name='TSMFK01')
relax> pipe.delete(pipe_name='pre')
relax> minimise(min_algor='simplex', line_search=None,
hessian_mod=None, hessian_type=None, func_tol=1e-25, grad_tol=None,
max_iter=10000000, constraints=True, scaling=True, verbosity=1)




Best
Troels
















Troels Emtekær Linnet

_______________________________________________
relax (http://www.nmr-relax.com)

This is the relax-devel mailing list
relax-devel@xxxxxxx

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-devel

Re: The best way to handle a cluster analysis

Header

Content

Related Messages