mailAICc


Others Months | Index by Date | Thread Index
>>   [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Header


Content

Posted by Sébastien Morin on October 08, 2009 - 06:49:
Hi,

I recently used the script 'palmer.py' with a single magnetic field
dataset (n=3) and tested AICc model selection (during stage 2).

I faced a problem of division by zero for models with two parameters
(such as models 'm2' and 'm3') since:
    AICc = chi2 + 2.0*k + 2.0*k*(k + 1.0) / (n - k - 1.0)

Also, when models had 3 parameters, the division was by -1, which
yielded negative AICc scores that relax ranked very well based on their
very small number...

The errors appeared as follows:

=================================
Model-free model of spin ':28&:GLU'.
Data pipe            Num_params_(k)       Num_data_sets_(n)   
Chi2                 Criterion          
m5                   3                    3                   
2.16490              -15.83510          
m4                   3                    3                   
2.27420              -15.72580          
m1                   1                    3                   
2.27420              8.27420            
Traceback (most recent call last):
  File "/home/semor/pse-4/collaborations/relax/relax-1.3.4/relax", line
418, in <module>
    Relax()
  File "/home/semor/pse-4/collaborations/relax/relax-1.3.4/relax", line
127, in __init__
    self.interpreter.run(self.script_file)
  File
"/home/semor/pse-4/collaborations/relax/relax-1.3.4/prompt/interpreter.py",
line 276, in run
    return run_script(intro=self.__intro_string, local=self.local,
script_file=script_file, quit=self.__quit_flag,
show_script=self.__show_script, raise_relax_error=self.__raise_relax_error)
  File
"/home/semor/pse-4/collaborations/relax/relax-1.3.4/prompt/interpreter.py",
line 537, in run_script
    return console.interact(intro, local, script_file, quit,
show_script=show_script, raise_relax_error=raise_relax_error)
  File
"/home/semor/pse-4/collaborations/relax/relax-1.3.4/prompt/interpreter.py",
line 433, in interact_script
    execfile(script_file, local)
  File "./palmer.py", line 166, in <module>
  File "./palmer.py", line 118, in exec_stage_2
  File
"/home/semor/pse-4/collaborations/relax/relax-1.3.4/prompt/model_selection.py",
line 132, in model_selection
    model_selection.select(method=method, modsel_pipe=modsel_pipe,
pipes=pipes)
  File
"/home/semor/pse-4/collaborations/relax/relax-1.3.4/generic_fns/model_selection.py",
line 273, in select
    crit = formula(chi2, float(k), float(n))
  File
"/home/semor/pse-4/collaborations/relax/relax-1.3.4/generic_fns/model_selection.py",
line 76, in aicc
    return chi2 + 2.0*k + 2.0*k*(k + 1.0) / (n - k - 1.0)
ZeroDivisionError: float division
=================================

I think it might be useful if there could be a warning message telling
when overfitting happens (division by 0 or by a negative number).

Also, if a division by zero occurs, the AICc score should be marked
something as 'NA (0)'. Moreover, when the division is by a negative
number, the AICc score should be marked something as 'NA (1)', with the
number in parentheses indicating the actual overfitting fold... Of
course, any 'NA' score should be prevented from serving as a model
selector, i.e. no models should be selected using such a score...

These improvements could be useful to people living on the edge of
overfitting (single field data, for example), but could also serve when
multiple field data was acquired bu a few residues have only data at one
field (due to magnetic field dependent peak overlapping, for example)...

What do you think ?


Séb  :)

-- 
Sébastien Morin
PhD Student
S. Gagné NMR Laboratory
Université Laval & PROTEO
Québec, Canada





Related Messages


Powered by MHonArc, Updated Tue Oct 13 19:00:17 2009