Sorry for the delay in replying but this needed some uninterrupted
time for me to sort through it
Posted by Edward d'Auvergne on October 11, 2006 - 10:32:
On Wed, 2006-10-11 at 17:02 +1000, Edward d'Auvergne wrote:
This post is proposal for the redesign the relax data model. This will
affect how data is input into the program, how data is selected, how
molecular structures are handled, how spin systems are handled, and how
many other parts of relax function. Importantly the internal structure
of 'self.relax.data' will completely change. These modifications will
essentially break every part of relax (the isolated code in the
directories 'minimise', 'maths_fns', and 'docs' will be safe from the
carnage, as will a few files in the base directory). If you have any
ideas for extending or improving the proposed data model, can see any
short-comings, deficiencies, or flaws, are familiar with the PDB
conventions, etc., your input is very much sought after. The changes
should occur in the 1.3 line of the repository. 1.2 versions will be
unaffected - scripts will remain compatible and the 1.2 line will
continue to be supported with bug fixes, etc.
I have to apologise in advance for the size of this proposal, to
simplify it I have divided the text into numbered sections. Once this
initial parent message has been sent I will respond to it with the text
of the 4 major sections. This will allow 4 major threads to branch off
from this message on the mailing list archive
(https://mail.gna.org/public/relax-devel). If you have an opinion,
idea, etc. about a specific section, could you please post a separate
message in response to the relevant major section post? Also if you
have unrelated ideas for one of these sections, could you post these as
separate messages as well? For example if you have separate points
about sections 3.1 and 3.5.1, two different posts responding to the
parent Section 3 post would be appreciated. Thanks. This will help to
focus each discussion point into specific threads.
Edward
Redesign of the relax data model
Index:
1. Why change?
1.1 The runs
1.2 The molecules
1.3 The residues
1.4 The spins
2. A new run concept
2.1 Parcelling up an abstract space
2.2 The run data model
2.3 The pipe concept
3. Molecules, residues, and spins
3.1 The spin data model
3.2 The data selection concept - identifying spin systems
3.2.1 Function arguments
3.2.2 NH data of a single protein macromolecule
3.2.3 A single organic molecule (non-polymeric)
3.2.4 A single RNA or DNA macromolecule
3.2.5 Complexes
3.3 Regular expression
3.4 The spin loop
3.5 Molecule, sequence, and spin user function classes
3.5.1 The 'molecule' user function class
3.5.2 The 'sequence' user function class
3.5.3 The 'spin' user function class
3.6 The input and output files
4. Conclusion
Before reading this post, please read the previous posts:
* The parent message 'Redesign of the relax data model: A HOWTO for
breaking relax.' located at
https://mail.gna.org/public/relax-devel/2006-10/msg00053.html
(Message-id:
<1160550133.9523.54.camel@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>).
* Section 1 'Redesign of the relax data model: 1. Why change?' located
at https://mail.gna.org/public/relax-devel/2006-10/msg00054.html
(Message-id:
<1160551172.9523.60.camel@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>).
2. A new run concept
2.1 Parcelling up an abstract space
The general idea is to further increase the prominence of the 'run'.
Rather than relax executing in an abstract space where the 'run' is
passed into each user function as necessary, the idea is that relax
executes within a space dedicated to a certain 'run'. So if at the
relax prompt, you could type a user function such as:
relax> run.current()
'm8'
By working in the 'm8' run space, each user function can be executed
without the need for the 'run' argument. Other user functions, such as
'run.switch()', can be used to change between runs.
I agree that carrying the run argument throughout the data structure
is an annoying problem and I like the solution but here is an
extension to it that may enegender more felxibility
There is an interesting parallel here... basically the proposal
consists of the proposal that there should always be a current run
(much in the same way that most shells have a present working
directory). However, it is worth noting that many unix tools take a
directory argument which overrides the current working directory and
this engenders both simplicity and flexibility as to which 'context' a
command runs in.
2.2 The run data model
The current run name could be stored in the single data structure
'self.relax.run'. The relax data structure could then be accessed by
typing 'self.relax.data[self.relax.run]'. I.e. 'self.relax.data' is a
DictType object (it has key-value pairs) in which the run name key is
associated with a specific data container. As most data structures in
the current relax data model are associated with a run (e.g.
'self.relax.data.diff[self.run]', 'self.relax.data.res[self.run]',
'self.relax.data.pdb[self.run]', etc), the data model significantly
simplifies.
now following on from the comment above I would suggest that a data
structure containing a stack of runs be a good idea.. consider a
command that took a run parameter:
def command(run=None):
self.relax.run.push(run)
... do something
self.relax.run.pop()
now there are some intrinsic problems with this setup (basically it is
far too easy to pop and then degugging really does become a
nightmare.... However, python actually has at least three solutions to
this(not all ow which are available in version 2.4 the with solution
requires 2.5)
1. decorators (python 2.4)
@relax_command
def command():
...do something
@relax_command then wraps command in a self.relax.run.push/pop(run) pair
2. define relax_command as a functor and then have a default
relax_command functor that wraps around with a push and a pop
class relax_command():
def __init__(self,function):
self.function=function
def __call__(self,*args):
#find run arg and save in local variable and remove from args
self.relax.push()
self.function(args)
self.relax.pop()
3. the with statement (python 2.5)
see
http://www.dalkescientific.com/writings/diary/archive/2006/08/23/with_statement.html
Some asides
A. I believe the runs that are passed around in relax are strings
which are then used to lookup data in a map. Why not just have
(runs/pipes) as objects... Then for example the call
self.relax.data[self.relax.run]
above becomes
self.relax.run.data a much more object orientated and encapsulated structure
B. There is a twist here, if relax is a global variable referenced by
everything if you want to run relax in a threaded manner
(multiprocessor machines are becoming more and more popular) then
self.relax poses a problem as we may need a different relax variable
for each processor so the relax variable needs to be acessed from
thread local storage cf
(http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/302088)
More information about the data model change is given in the message at
located at https://mail.gna.org/public/relax-devel/2006-05/msg00008.html
(Message-id:
<7f080ed10605232038j5036278dg39136d75a05a9904@xxxxxxxxxxxxxx>) and the
response located at
https://mail.gna.org/public/relax-devel/2006-05/msg00010.html
(Message-id:
<7f080ed10605241912i7c35f574i94f139588c5fa16b@xxxxxxxxxxxxxx>).
2.3 The pipe concept
A single run can be thought of as a pipe where data is input, processed,
or output as user functions are called. There are different types of
pipe for different analyses, e.g. a reduced spectral density mapping
pipe, a model-free pipe, an exponential curve-fitting pipe, etc. When
running relax you choose which run (or pipe) you are currently in and
the 'run.switch()' user function allows you to jump between multiple
runs (or pipes). The modification of user functions in which runs are
combined or branched (which can be thought of as the pipes merging or
splitting) would be straight forward. For example the
'model_selection()' user function currently accepts the following
arguments:
model_selection(self, method=None, modsel_run=None, runs=None)
In this case the 'modsel_run' can be dropped and the results of model
selection placed into the current run (or pipe). The 'run' user
function class could contain the following user functions for pipe
manipulation:
run.copy() # Create a new run (or pipe) with the current contents of
another run (or pipe).
run.create() # Create a new run (or pipe). Switch to this pipe by
default.
run.current() # Print the current run (or pipe).
run.delete() # Delete the given run (or pipe).
run.delete_all() # Delete all runs. Essentially deleting
'self.relax.data'.
you might want to consider a nullObject here so that if all runs are
deleted you don't crash just raise error messages...
run.hybridise() # Fuse two runs (or pipes) into the current run (or
pipe). Overlapping data in the two runs must be identical!
run.list() # Print all runs (or pipes).
run.switch() # Switch to another run (or pipe).
Now here is a further comment if run were an object that contained its
own data many of these processes could be dealt with using pythons own
semantics
e.g.
run.copy():
from copy import copy
new_run=copy(run)
run.create():
new_run = Run()
run.delete():
new_run = Run()
new_run = None # run dissapears due to grbage collection/ref counts
One evolutionary path of the run concept which could be followed with
this set of proposed changes is to completely replace it with the pipe
concept. All instances of 'run' in relax would be renamed to 'pipe'.
For example 'run.create()' will become 'pipe.create()',
'self.relax.data[self.relax.run]' will become
'self.relax.data[self.relax.pipe]', etc. I believe that the name 'pipe'
is a better representation of the run concept than 'run'. What do you
think of the idea?
another name would be processor or command
The hypothetical ideas of this paragraph are not part of the current
proposals, however they further illustrate the pipe concept. The pipe
concept is highly amenable for the creation of a Qt GUI. Program
execution could be directed by a graphical 'pipe' construction (possibly
in 3D using OpenGL). Elements of the pipe, equivalent to the user
functions of the prompt and script interfaces, could be dragged from
toolbars and dropped into a canvas. These could be linked together by
moving the element with the mouse and having it click into other
elements. For example 'run.delete()' (alternatively 'pipe.delete()')
could be represented as a cap added to the end of a pipe - its execution
removes all the data of that pipe from memory. This pictorial
representation of execution would be very powerful and intuitive.
Scripts could be imported into the GUI and represented as a network of
interconnected pipes and vice versa. Execution of relax could even be
animated as semi-transparent pipes filling up bit by bit as each user
executes. Imagination is the only limit!
regards
gary