mailRe: Future direction of the data structure 'self.relax.data'.


Others Months | Index by Date | Thread Index
>>   [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Header


Content

Posted by Edward d'Auvergne on May 25, 2006 - 04:13:
On 5/24/06, Chris MacRaild <c.a.macraild@xxxxxxxxxxx> wrote:
On Wed, 2006-05-24 at 13:38 +1000, Edward d'Auvergne wrote:
> This is a continuation of my response to the thread started by Chris
> at https://mail.gna.org/public/relax-devel/2006-05/msg00006.html.  I
> thought I would start a new thread as this is a bit of an aside.
>
> Eventually one of the goals of relax is to change the structure of the
> data structure 'self.relax.data'.  Having each object inside
> 'self.relax.data' as a dictionary where 'self.run' is used as a key to
> select the run specific data is clumsy.  A much cleaner implementation
> (which would be completely invisible to users) would be to have
> 'self.relax.data' as the dictionary in which 'self.run' is used to
> select between the run specific data.  Then there would only be a
> single point within the entire program where run specific data is
> selected.  You could then type something like:
>
>     data = self.relax.data[self.run]
>
> and then never reference the run again in that block of code.  Each
> run would then have it's own area within 'self.relax.data' completely
> to itself.  This could simplify things when new types of data analysis
> (SRLS, relaxation dispersion, etc) are added to the program.
>

I agree that this would be a significant improvement, but also a huge
amount of work. Probably worthwile in that it will make maintenance and
further development easier.

This will definitely require an unstable branch. Maybe 1.3 or 1.5 depending on how or if the MPI ideas move along. It would involve a huge code clean out - although it would be quite simple work. When I do get the chance to do it (most likely after I finish the PhD), I'm sure many latent bugs will be inadvertently removed. As the internal changes will be massive, once it has been stabilised I will probably create the stable 2.0 branch.

It might also be worth considering what will be the best structure to
have inside this dictionary. The simple option is to have another
dictionary, so:

     self.relax.data[self.run][param_name] = param_value

An alternative would be to adopt an object-orientated approach here, so
self.relax.data[self.run] would point to an instance of a container
class which could be specific to the run type.

I would prefer containers for exactly the same reasons you detail below.

The advantage of this approach is that it would be possible to define
methods on the class to do data manipulation in a run-type specific way.
So:

    self.relax.data[self.run].do_something()

will do something in the appropriate way for the run type. This has the
potential of eliminating a layer or two of wrapper and mapping
functions, and thus achieving even greater gains in terms of
simplification of the code base and perhaps even performance. Ofcourse
the trade-off is that this would entail an even more extensive re-write.

As 'self.relax.data' is already a container, changing this into a dictionary of containers identical to the current 'self.relax.data' structure should be straight forward. I just haven't done that yet because absolutely everything will break! One of the most useful functions associated with the container is the '__repr__(self)' function. This can be added to the dictionary structure as well (for example see the class SpecificData in the data.py file). The idea would be that after typing:

relax> self.relax.data

you get a verbose listing of all the runs you have created.  By typing:

relax> self.relax.data['m1']

you get a listing of all the data structures within that container if
the run 'm1' exists, otherwise a message saying that the run has not
been created yet.

Edward



Related Messages


Powered by MHonArc, Updated Thu May 25 04:21:17 2006