mailRe: Pickling problems with the relax data storage singleton.


Others Months | Index by Date | Thread Index
>>   [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Header


Content

Posted by Edward d'Auvergne on November 27, 2007 - 09:41:
On Nov 27, 2007 12:17 AM, Chris MacRaild <macraild@xxxxxxxxxxx> wrote:

On Nov 26, 2007 9:37 PM, Edward d'Auvergne <edward.dauvergne@xxxxxxxxx> 
wrote:
On Nov 26, 2007 9:39 AM, gary thompson <garyt.and.sarahb@xxxxxxxxx> wrote:




On Nov 26, 2007 2:32 AM, Chris MacRaild <macraild@xxxxxxxxxxx> wrote:

On Nov 23, 2007 7:14 PM, Gary Thompson <garyt@xxxxxxxxxxxxxxx> wrote:


[snip]


Another (better?) option would be to do saving and restoring state at
a lower level. So instead of simply pickling the whole Data object, we
have save and restore methods of the Data class that do the pickling
in a more controled way. This seems to me more true to the intent of
the singleton pattern, avoiding the complications Gary refers to. Also
the control over what gets saved and how might be useful.

A quick sketch of the sort of thing I'm thinking:

class Data(dict):
   ...
   def _save(file):
       P = Pickler(file)
       dont_save = [] # a list of attributes that don't need saving,
eg. methods
       for name,  attr in self.__dict__.items():
           if not name in dont_save:
               P.dump((name,attr))

   def _restore(file):
       P = Unpickler(file)
       while True:
           try:
               name, attr = U.load()
           except EOFError:
               break
           setAttr(self, name, attr)

Then the user commands save_state and restore_state are just
front-ends to Data._save and Data._restore. Pickle needn't wory itself
with our unusual namespace, because only attributes of Data are
pickled, not Data itself. Save and restore functions are mehtods of
the Singleton object, so there is no risk of breaking the Singleton
pattern. Finally, we have the basis of a mechanism there to control
what gets saved/restored and how.

Cheers,

Chris





This seems good but have some other possibly  interesting questions

1. should data be a singleton? what happens if i need to load two data
hierachies for example for comparison between two runs (If i am barking 
up
the wrong tree about the current design here please disregard this)

The data storage singleton can handle this.  It is a DictionaryType
object, its keys corresponding to individual PipeContainer objects.
These PipeContainers are the data pipes, the embodiment of the morphed
'runs' concept.  So the change was to put everything into a
PipeContainer and the comparisons will be between the data contents of
any sets of PipeContainers.  As relax runs, the contents of each data
pipe is modified and molded.  The links between these pipes include
functions for switching between them, copying data between them, and
merging the contents together and placing the result into a new pipe
(e.g. model selection, hybridisation, etc.).  These are just very
basic plumbing concepts ;)


2. we have to  be careful we don't pickle any python state (what is 
python
state can change between versions) How do we identify what is python 
state,
create an empty object?

The saved states are not very portable.  I'm hoping the unit tests
will pull out these issues, as there will be a few saved states in the
shared data directory you created.


Pickling can do nothing other than save python state, which is why
pickle-based saved states are not portable. In principle we could try
to ensure that we only pickle 'standard library' python state (ie.
only objects that are defined in the standard library). Then saved
states would be portable between relax versions (but not necessarily
python versions, or architectures I suspect). This would require a
much more complicated and high-maintenance save/restore code that
recursed down the attributes of the Data object and reduced them to
standard library objects. Better, I think, to accept that saved state
is not the right tool for portable data storage, and therefore not
care too much about pickling state that might change in future relax
(or python) versions.

The save state is a temporary construct, but its portability isn't too
bad.  For example in the test suite (1.3 line) there is a save state
that works for me on Python 2.4 on a 32 bit GNU/Linux machine, on
Python 2.5 on a 64 bit GNU/Linux machine, and on Python 2.4 on a 32
bit (I think, but maybe it's 64 bit) Windows Vista machine.  The only
real problem is if the structure of the relax data storage singleton
changes which, for the stable lines, I try to keep constant.  Well,
new additions are ok, but renaming or changing existing structures is
bad.  Note in the docs though I mention that the save states may not
be usable between different relax versions.  I think I only broke
saves states once so far in the 1.2.x releases.  I don't know whether
we should, for portability, freeze the structure permanently, apart
from additions, for the 1.2 and 1.4 lines (when 1.4 exists)?  But for
the unstable 1.3 line, allow the large changes that are necessary for
new code.

So, I would avoid a complex pickling routine.  Most of the objects I
have created in the singleton are highly modified Python types that I
had to carefully make sure that they were picklable.  I think the best
would be to dump the whole thing and then, because of the singleton
design pattern - the singleton can't be replaced using "singleton =
pickle.load('state')", that we take the contents object by object,
key:value by key:value pair of the unpickled state and place it back
into the singleton.  Dumping the whole structure may be necessary
because the singleton is a DictionaryType with keys corresponding to
the special PipeContainer objects.  The singleton dictionary also
contains objects such as 'relax_data_store.current_pipe'.



3. Chris's  will add an extra requirement for maintenance in the future 
as
all new fields have to be registered for saving

That is true.  Hopefully when new objects are added, a corresponding
unit test is created to catch it.  But yes, this is a danger.  So
rather than having a white list of object to include a black list of
objects to exclude will be better, as then new objects will
automatically be saved.


No. I proposed registering attributes that need *not* be saved,
precisely to avoid this issue. Also keep in mind that the status quo
is to pickle the entire Data object, so even if we do forget to
exclude attributes, we are in no worse position that we currently are.

Ah, ok.  Well the current code in generic_fns.state.load_state() does
this for the unpickling, blacklisting the '__weakref__' object.

Regards,

Edward



Related Messages


Powered by MHonArc, Updated Tue Nov 27 10:02:52 2007