mailRe: Problem during "final" run of d'Auvergne Protocol


Others Months | Index by Date | Thread Index
>>   [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Header


Content

Posted by Edward d'Auvergne on March 06, 2012 - 13:04:
One other point is that I've recently been working on cleaning up,
simplifying, and fixing a few IO stream bugs the multi-processor
package in the 1.3 line of the relax repository since I tagged and
released the 1.3.13 version.  So there is a slight chance that I may
accientally have fixed the problem already.  But you'll need to check
out the most up to date repository code with the subversion program to
test this.

Regards,

Edward


On 6 March 2012 12:58, Edward d'Auvergne <edward@xxxxxxxxxxxxx> wrote:
Actually, looking the code, it appears as though the multi-processor
error handling is failing.  Which means that there are probably two
bugs here.  One is causing the program to fail, the second in the
multi-processor error handling is causing the memory error, hiding the
frist problem.  Could you replace the run() function in
multi/uni_processor.py code?  The original code should be:

   def run(self):
       try:
           self.pre_run()
           self.callback.init_master(self)
           self.post_run()
       except Exception, e:
           self.callback.handle_exception(self, e)

Could you replace it with:

   def run(self):
       self.pre_run()
       self.callback.init_master(self)
       self.post_run()

and see what the error message is?  If what I said above is correct,
then this should uncover the first bug (which then triggers the
second).  By the way, how long does it take to test this problem?

Cheers,

Edward



On 6 March 2012 12:49, Edward d'Auvergne <edward@xxxxxxxxxxxxx> wrote:
Hi,

Thank you for all the details.  That really helps in narrowing down
the bug!  From all the info, the bug is without doubt within the
multi-processor package.  Cheers.  If you have a little time, we can
work together and fix this.  The changes/fixes will go into the
repository version, so you'll need a copy of that for testing.  Do you
have the subversion program installed?  If so, you can obtain the most
up to date copy from the repository by typing:

$ svn co svn://svn.gna.org/svn/relax/1.3 relax-1.3

or if this doesn't work:

$ svn co http://svn.gna.org/svn/relax/1.3 relax-1.3

If you already have a checked out copy, you can update to the newest
copy by typing:

$ svn up

I'll look at the second bug you've identifed later.  It would be
appreciated if you created a second bug report for that problem too.
I would not recommend reverting to earlier relax versions due to the
number of bug fixes and other problems solved since then.  This should
not affect the model-free results, but the bugs could bite elsewhere.
Hopefully I can fix this problem quickly.

Cheers,

Edward


P. S.  For reference, the bug report is https://gna.org/bugs/?19528.



On 6 March 2012 12:18, Hugh RW Dannatt <h.dannatt@xxxxxxxxxxxxxxx> wrote:
Hi Edward,

Your description sounds very likely the cause of the problem, during
the time where no output is being produced, the computer gets
gradually more and more slow before finally giving up.

The error is reproducible such that I have tried it on a couple of
different machines and it has failed several times at the same stage.
The error messages tend to vary a little, however. Here are another 2
of the outputs given when the program has failed (I should clarify all
of these messages came from runs done on the same machine, and the
second was run with option "-d" but it hasn't helped very much):-

Simulation 492
Simulation 493
Simulation 494
Simulation 495
Simulation 496
Simulation 497
Simulation 498
Simulation 499
Simulation 500
Traceback (most recent call last):
 File "/usr/local/relax-1.3.13/multi/uni_processor.py", line 136, in run
   self.callback.init_master(self)
 File "/usr/local/relax-1.3.13/multi/processor.py", line 263, in 
default_init_m
aster
Traceback (most recent call last):
 File "/usr/local/bin/relax", line 7, in <module>
   relax.start()
 File "/usr/local/relax-1.3.13/relax.py", line 100, in start
   processor.run()
 File "/usr/local/relax-1.3.13/multi/uni_processor.py", line 139, in run
   self.callback.handle_exception(self, e)
 File "/usr/local/relax-1.3.13/multi/processor.py", line 250, in 
default_handle
_exception
   traceback.print_exc(file=sys.stderr)
 File "/usr/lib/python2.6/traceback.py", line 227, in print_exc
   print_exception(etype, value, tb, limit, file)
 File "/usr/lib/python2.6/traceback.py", line 125, in print_exception
   print_tb(tb, limit, file)
 File "/usr/lib/python2.6/traceback.py", line 69, in print_tb
   line = linecache.getline(filename, lineno, f.f_globals)
 File "/usr/lib/python2.6/linecache.py", line 14, in getline
   lines = getlines(filename, module_globals)
 File "/usr/lib/python2.6/linecache.py", line 40, in getlines
   return updatecache(filename, module_globals)
 File "/usr/lib/python2.6/linecache.py", line 136, in updatecache
   lines = fp.readlines()
MemoryError
9203.219u 258.488s 8:05:09.46 32.5%     0+0k 90962440+0io 2215895pf+0w

------------------

Simulation 489
Simulation 490
Simulation 491
Simulation 492
Simulation 493
Simulation 494
Simulation 495
Simulation 496
Simulation 497
Simulation 498
Simulation 499
Simulation 500
debug> Execution lock:  Release by 'script UI' ('script' mode).
debug> Execution lock:  Release by 'script UI' ('script' mode).
Traceback (most recent call last):
 File "/progs/Linux/bin/relax13", line 7, in <module>
   relax.start()
 File "/progs/relax-1.3.13/relax.py", line 100, in start
   processor.run()
 File "/progs/relax-1.3.13/multi/uni_processor.py", line 139, in run
   self.callback.handle_exception(self, e)
 File "/progs/relax-1.3.13/multi/processor.py", line 250, in 
default_handle_exc
eption
   traceback.print_exc(file=sys.stderr)
 File "/usr/lib/python2.6/traceback.py", line 227, in print_exc
   print_exception(etype, value, tb, limit, file)
MemoryError

8006.268u 542.873s 8:34:11.81 27.7%     0+0k 225824840+0io 6192344pf+0w

------------------

If the number of MC simulations is dropped even as little as 100, the
program finishes the fitting successfully, though I then get an error
message to do with the grace files (i've not been using them so I'm
not bothered about this though it will be of interest to you no
doubt):-

Data pipe 'final':  The ts value of 2.6285e-08 is greater than 
1.9714e-08, elimi
nating simulation 94 of spin system ':218@N'.
Data pipe 'final':  The ts value of 2.6285e-08 is greater than 
1.9714e-08, elimi
nating simulation 95 of spin system ':218@N'.

relax> monte_carlo.error_analysis(prune=0.0)

relax> results.write(file='results', 
dir='/ld10c/home1/hugh/data/pgm298bq/relax/
final', compress_type=1, force=True)
Opening the file 
'/ld10c/home1/hugh/data/pgm298bq/relax/final/results.bz2' for w
riting.

relax> grace.write(x_data_type='spin', y_data_type='s2', spin_id=None, 
plot_data
='value', file='s2.agr', 
dir='/ld10c/home1/hugh/data/pgm298bq/relax/final/grace'
, force=True, norm=False)
Opening the file 
'/ld10c/home1/hugh/data/pgm298bq/relax/final/grace/s2.agr' for
writing.

relax> grace.write(x_data_type='spin', y_data_type='s2f', spin_id=None, 
plot_dat
a='value', file='s2f.agr', 
dir='/ld10c/home1/hugh/data/pgm298bq/relax/final/grac
e', force=True, norm=False)
Opening the file 
'/ld10c/home1/hugh/data/pgm298bq/relax/final/grace/s2f.agr' for
 writing.

relax> grace.write(x_data_type='spin', y_data_type='s2s', spin_id=None, 
plot_dat
a='value', file='s2s.agr', 
dir='/ld10c/home1/hugh/data/pgm298bq/relax/final/grac
e', force=True, norm=False)
Opening the file 
'/ld10c/home1/hugh/data/pgm298bq/relax/final/grace/s2s.agr' for
 writing.

relax> grace.write(x_data_type='spin', y_data_type='te', spin_id=None, 
plot_data
='value', file='te.agr', 
dir='/ld10c/home1/hugh/data/pgm298bq/relax/final/grace'
, force=True, norm=False)
Opening the file 
'/ld10c/home1/hugh/data/pgm298bq/relax/final/grace/te.agr' for
writing.

relax> grace.write(x_data_type='spin', y_data_type='tf', spin_id=None, 
plot_data
='value', file='tf.agr', 
dir='/ld10c/home1/hugh/data/pgm298bq/relax/final/grace'
, force=True, norm=False)
Opening the file 
'/ld10c/home1/hugh/data/pgm298bq/relax/final/grace/tf.agr' for
writing.

relax> grace.write(x_data_type='spin', y_data_type='ts', spin_id=None, 
plot_data
='value', file='ts.agr', 
dir='/ld10c/home1/hugh/data/pgm298bq/relax/final/grace'
, force=True, norm=False)
Opening the file 
'/ld10c/home1/hugh/data/pgm298bq/relax/final/grace/ts.agr' for
writing.

relax> grace.write(x_data_type='spin', y_data_type='rex', spin_id=None, 
plot_dat
a='value', file='rex.agr', 
dir='/ld10c/home1/hugh/data/pgm298bq/relax/final/grac
e', force=True, norm=False)
Opening the file 
'/ld10c/home1/hugh/data/pgm298bq/relax/final/grace/rex.agr' for
 writing.
debug> Execution lock:  Release by 'script UI' ('script' mode).
debug> Execution lock:  Release by 'script UI' ('script' mode).
Traceback (most recent call last):
 File "/ld10c/progs/relax-1.3.13/prompt/interpreter.py", line 383, in 
exec_scri
pt
   runpy.run_module(module, globals)
 File "/usr/lib/python2.6/runpy.py", line 140, in run_module
   fname, loader, pkg_name)
 File "/usr/lib/python2.6/runpy.py", line 34, in _run_code
   exec code in run_globals
 File 
"/ld10c/home1/hugh/data/pgm298bq/relax/dauvergne_protocol_lessMC.py", lin
e 216, in <module>
   dAuvergne_protocol(pipe_name=name, diff_model=DIFF_MODEL, 
mf_models=MF_MODEL
S, local_tm_models=LOCAL_TM_MODELS, grid_inc=GRID_INC, 
min_algor=MIN_ALGOR, mc_s
im_num=MC_NUM, conv_loop=CONV_LOOP)
 File "/ld10c/progs/relax-1.3.13/auto_analyses/dauvergne_protocol.py", 
line 223
, in __init__
   self.execute()
 File "/ld10c/progs/relax-1.3.13/auto_analyses/dauvergne_protocol.py", 
line 710
, in execute
   self.write_results()
 File "/ld10c/progs/relax-1.3.13/auto_analyses/dauvergne_protocol.py", 
line 837
, in write_results
   self.interpreter.grace.write(x_data_type='spin', y_data_type='rex', 
file='re
x.agr',       dir=dir, force=True)
 File "/ld10c/progs/relax-1.3.13/prompt/grace.py", line 103, in write
   grace.write(x_data_type=x_data_type, y_data_type=y_data_type, 
spin_id=spin_i
d, plot_data=plot_data, file=file, dir=dir, force=force, norm=norm)
 File "/ld10c/progs/relax-1.3.13/generic_fns/grace.py", line 366, in write
   write_xy_header(sets=len(data[0]), file=file, data_type=[x_data_type, 
y_data
_type], seq_type=seq_type, set_names=set_names, norm=norm)
 File "/ld10c/progs/relax-1.3.13/generic_fns/grace.py", line 600, in 
write_xy_h
eader
   units = return_units(data_type[i])
 File "/ld10c/progs/relax-1.3.13/specific_fns/model_free/main.py", line 
2394, i
n return_units
   raise RelaxNoSpinSpecError
RelaxNoSpinSpecError: RelaxError: The spin system must be specified.


3510.479u 20.741s 59:07.76 99.5%        0+0k 0+3368io 0pf+0w

------------------

Finally, this is the output from relax --info as requested:-

                                           relax 1.3.13

                             Molecular dynamics by NMR data analysis

                            Copyright (C) 2001-2006 Edward d'Auvergne
                        Copyright (C) 2006-2011 the relax development team

This is free software which you are welcome to modify and redistribute
under the conditions of the
GNU General Public License (GPL).  This program, including all
modules, is licensed under the GPL
and comes with absolutely no warranty.  For details type 'GPL' within
the relax prompt.

Assistance in using the relax prompt and scripting interface can be
accessed by typing 'help' within
the prompt.

Processor fabric:  Uni-processor.

Hardware information:
   Machine:                 i686
   Processor:

System information:
   System:                  Linux
   Release:                 2.6.32-37-generic
   Version:                 #81-Ubuntu SMP Fri Dec 2 20:35:14 UTC 2011
   GNU/Linux version:       Ubuntu 10.04 lucid
   Distribution:            Ubuntu 10.04 lucid
   Full platform string:
Linux-2.6.32-37-generic-i686-with-Ubuntu-10.04-lucid

Software information:
   Architecture:            32bit ELF
   Python version:          2.6.5
   Python branch:           tags/r265
   Python build:            r265:79063, Apr 16 2010 13:09:56
   Python compiler:         GCC 4.4.3
   Python implementation:   CPython
   Python revision:         79063
   Numpy version:           1.3.0
   Libc version:            glibc 2.4

Python packages (most are optional):

Package              Installed       Version         Path
minfx                True            Unknown
/ld10c/progs/relax-1.3.13/minfx
bmrblib              True            Unknown
/ld10c/progs/relax-1.3.13/bmrblib
numpy                True            1.3.0
/usr/lib/python2.6/dist-packages/numpy
scipy                True            0.7.0
/usr/lib/python2.6/dist-packages/scipy
wxPython             False
mpi4py               False
epydoc               False
optparse             True            1.5.3
/usr/lib/python2.6/optparse.pyc
readline             True
/usr/lib/python2.6/lib-dynload/readline.so
profile              True
/usr/lib/python2.6/profile.pyc
bz2                  True
/usr/lib/python2.6/lib-dynload/bz2.so
gzip                 True                            
/usr/lib/python2.6/gzip.pyc
os.devnull           True                            
/usr/lib/python2.6/os.pyc

Compiled relax C modules:
   Relaxation curve fitting: True

------------------

Apologies for all the detail but I'm not really sure what to do here.
If it is the multi-processor part of it that is failing, is installing
relax 1.3.11 an option? I previously has 1.3.10 installed and the
commands seem to have changed quite a lot since then. What is your
opinion on the validity of error estimates based on 100 simulations?

Thanks

Hugh



On 5 March 2012 08:33, Edward d'Auvergne <edward.dauvergne@xxxxxxxxx> 
wrote:
Hi Hugh,

I'm pretty sure this error has not been encountered before.  It at
least hasn't been reported.  I've never seen anything close to this
before, but I would guess that this is an infinitely recursive
exception (the error is being caught but, in the process, the error
occurs again, being caught a second time, then the 3rd error occurs,
is caught a 3rd time, with this continuing until your computer runs
out of RAM and swap space and relax is killed by the operating
system).  The error seems to occur within the error handing portion of
Gary Thompson's multi-processor framework (you are using the
uni-processor fabric of the framework here), so maybe Gary might know
a solution?

Is this error reproducible?  For testing, can you drop the number of
Monte Carlo simulations down to say 5?  Running relax with the debug
flag might also help:

$ relax --debug

or:

$ relax -d

Are you using the GUI or scripting user interface?  The output of:

$ relax --info

might also be useful.  As for your data set being too large, relax has
been used on much bigger systems before so this should not be an
issue.  One last thing, would you be able to create a bug report for
this error (https://gna.org/bugs/?func=additem&group=relax)?  All of
the info/log files can then be pasted/attached there, and it is a
useful future reference for anyone who encounters the same or a
similar bug.

Cheers,

Edward



On 2 March 2012 12:33, Hugh RW Dannatt <h.dannatt@xxxxxxxxxxxxxxx> wrote:
Dear All,

Having completed the fitting of 1 dataset without any problems, I am
now moving onto another. Everything has worked fine until I change the
DIFF_MODEL to "final" and try to run the program again to get error
estimates on my fitted parameters.

The program successfully re-opens all the results file and selects the
diffusion model. Then all 500 simulations are done without issue, but
as soon as the program has finished this, it stops outputting anything
to the screen for a long time (>12 hrs). During this time, the CPU and
Memory use is very high and the computer runs slowly. Eventually I get
a "Memory Error" and a whole load of messages outputted to the screen,
which I have pasted below. I should emphasize that all the stages of
running this program with different diffusion models have run fine,
and the computer I'm using is a relatively fast machine (dual core
Pentium 4, 2 GB RAM).

Has anyone had a similar problem? This dataset is larger than the
previous one which fit without issue (current one has 6 measurements
per 176 residues), but I can't imagine this being the cause of this
problem.

Thanks

Hugh

----

Simulation 485
Simulation 486
Simulation 487
Simulation 488
Simulation 489
Simulation 490
Simulation 491
Simulation 492
Simulation 493
Simulation 494
Simulation 495
Simulation 496
Simulation 497
Simulation 498
Simulation 499
Simulation 500


Traceback (most recent call last):
 File "/progs/relax-1.3.13/multi/uni_processor.py", line 136, in run
   self.callback.init_master(self)
 File "/progs/relax-1.3.13/multi/processor.py", line 263, in
default_init_master
   self.master.run()
 File "/progs/relax-1.3.13/relax.py", line 171, in run
   self.interpreter.run(self.script_file)
 File "/progs/relax-1.3.13/prompt/interpreter.py", line 300, in run
   return run_script(intro=self.__intro_string, local=locals(),
script_file=script_file, quit=self.__quit_flag,
show_script=self.__show_script,
raise_relax_error=self.__raise_relax_error)
 File "/progs/relax-1.3.13/prompt/interpreter.py", line 610, in 
run_script
   return console.interact(intro, local, script_file, quit,
show_script=show_script, raise_relax_error=raise_relax_error)
 File "/progs/relax-1.3.13/prompt/interpreter.py", line 495, in 
interact_script
   exec_script(script_file, local)
 File "/progs/relax-1.3.13/prompt/interpreter.py", line 383, in 
exec_script
   runpy.run_module(module, globals)
 File "/usr/lib/python2.6/runpy.py", line 140, in run_module
   fname, loader, pkg_name)
 File "/usr/lib/python2.6/runpy.py", line 34, in _run_code
   exec code in run_globals
 File "/home1/hugh/data/pgm298bq/relax/dauvergne_protocol.py", line
216, in <module>
   dAuvergne_protocol(pipe_name=name, diff_model=DIFF_MODEL,
mf_models=MF_MODELS, local_tm_models=LOCAL_TM_MODELS,
grid_inc=GRID_INC, min_algor=MIN_ALGOR, mc_sim_num=MC_NUM,
conv_loop=CONV_LOOP)
 File "/progs/relax-1.3.13/auto_analyses/dauvergne_protocol.py", line
223, in __init__
Traceback (most recent call last):
 File "/progs/Linux/bin/relax13", line 7, in <module>
   relax.start()
 File "/progs/relax-1.3.13/relax.py", line 100, in start
   processor.run()
 File "/progs/relax-1.3.13/multi/uni_processor.py", line 139, in run
   self.callback.handle_exception(self, e)
 File "/progs/relax-1.3.13/multi/processor.py", line 250, in
default_handle_exception
   traceback.print_exc(file=sys.stderr)
 File "/usr/lib/python2.6/traceback.py", line 227, in print_exc
   print_exception(etype, value, tb, limit, file)
 File "/usr/lib/python2.6/traceback.py", line 125, in print_exception
   print_tb(tb, limit, file)
 File "/usr/lib/python2.6/traceback.py", line 69, in print_tb
   line = linecache.getline(filename, lineno, f.f_globals)
 File "/usr/lib/python2.6/linecache.py", line 14, in getline
   lines = getlines(filename, module_globals)
 File "/usr/lib/python2.6/linecache.py", line 40, in getlines
   return updatecache(filename, module_globals)
 File "/usr/lib/python2.6/linecache.py", line 136, in updatecache
   lines = fp.readlines()
MemoryError
9078.655u 666.933s 10:55:29.66 24.7%    0+0k 241482000+0io 6665721pf+0w

_______________________________________________
relax (http://nmr-relax.com)

This is the relax-users mailing list
relax-users@xxxxxxx

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-users



--
Hugh Dannatt
PhD Student Researcher

Prof. Jon Waltho Lab
Department of Molecular Biology & Biotechnology
University of Sheffield
Firth Court
Western Bank
Sheffield
S10 2TN

0114 222 2729



Related Messages


Powered by MHonArc, Updated Tue Mar 06 14:20:07 2012