re: Redesign of the relax data model: 2. A new run concept -- January 07, 2007

Sorry for the delay in replying but this needed some uninterrupted
time for me to sort through it

  Posted by Edward d'Auvergne on October 11, 2006 - 10:32:

  On Wed, 2006-10-11 at 17:02 +1000, Edward d'Auvergne wrote:

      This post is proposal for the redesign the relax data model.  This will
      affect how data is input into the program, how data is selected, how
      molecular structures are handled, how spin systems are handled, and how
      many other parts of relax function.  Importantly the internal structure
      of 'self.relax.data' will completely change.  These modifications will
      essentially break every part of relax (the isolated code in the
      directories 'minimise', 'maths_fns', and 'docs' will be safe from the
      carnage, as will a few files in the base directory).  If you have any
      ideas for extending or improving the proposed data model, can see any
      short-comings, deficiencies, or flaws, are familiar with the PDB
      conventions, etc., your input is very much sought after.  The changes
      should occur in the 1.3 line of the repository.  1.2 versions will be
      unaffected - scripts will remain compatible and the 1.2 line will
      continue to be supported with bug fixes, etc.

      I have to apologise in advance for the size of this proposal, to
      simplify it I have divided the text into numbered sections.  Once this
      initial parent message has been sent I will respond to it with the text
      of the 4 major sections.  This will allow 4 major threads to branch off
      from this message on the mailing list archive
      (https://mail.gna.org/public/relax-devel).  If you have an opinion,
      idea, etc. about a specific section, could you please post a separate
      message in response to the relevant major section post?  Also if you
      have unrelated ideas for one of these sections, could you post these as
      separate messages as well?  For example if you have separate points
      about sections 3.1 and 3.5.1, two different posts responding to the
      parent Section 3 post would be appreciated.  Thanks.  This will help to
      focus each discussion point into specific threads.

      Edward

      Redesign of the relax data model

      Index:
      1.  Why change?
          1.1  The runs
          1.2  The molecules
          1.3  The residues
          1.4  The spins
      2.  A new run concept
          2.1  Parcelling up an abstract space
          2.2  The run data model
          2.3  The pipe concept
      3.  Molecules, residues, and spins
          3.1  The spin data model
          3.2  The data selection concept - identifying spin systems
              3.2.1  Function arguments
              3.2.2  NH data of a single protein macromolecule
              3.2.3  A single organic molecule (non-polymeric)
              3.2.4  A single RNA or DNA macromolecule
              3.2.5  Complexes
          3.3  Regular expression
          3.4  The spin loop
          3.5  Molecule, sequence, and spin user function classes
              3.5.1  The 'molecule' user function class
              3.5.2  The 'sequence' user function class
              3.5.3  The 'spin' user function class
          3.6  The input and output files
      4.  Conclusion

  Before reading this post, please read the previous posts:

  * The parent message 'Redesign of the relax data model:  A HOWTO for
  breaking relax.' located at
  https://mail.gna.org/public/relax-devel/2006-10/msg00053.html
  (Message-id:
  <1160550133.9523.54.camel@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>).

  * Section 1 'Redesign of the relax data model:  1.  Why change?' located
  at https://mail.gna.org/public/relax-devel/2006-10/msg00054.html
  (Message-id:
  <1160551172.9523.60.camel@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>).

  2.  A new run concept

  2.1  Parcelling up an abstract space

  The general idea is to further increase the prominence of the 'run'.
  Rather than relax executing in an abstract space where the 'run' is
  passed into each user function as necessary, the idea is that relax
  executes within a space dedicated to a certain 'run'.  So if at the
  relax prompt, you could type a user function such as:

  relax> run.current()
  'm8'

  By working in the 'm8' run space, each user function can be executed
  without the need for the 'run' argument.  Other user functions, such as
  'run.switch()', can be used to change between runs.


I agree that carrying the run argument throughout the data structure
is an annoying problem and I like the solution but here is an
extension to it that may enegender more felxibility

There is an interesting parallel here... basically the proposal
consists of the proposal that there should always be a current run
(much in the same way that most shells have a present working
directory). However, it is worth noting that many unix tools take a
directory argument which overrides the current working directory and
this engenders both simplicity and flexibility as to which 'context' a
command runs in.


  2.2  The run data model

  The current run name could be stored in the single data structure
  'self.relax.run'.  The relax data structure could then be accessed by
  typing 'self.relax.data[self.relax.run]'.  I.e. 'self.relax.data' is a
  DictType object (it has key-value pairs) in which the run name key is
  associated with a specific data container.  As most data structures in
  the current relax data model are associated with a run (e.g.
  'self.relax.data.diff[self.run]', 'self.relax.data.res[self.run]',
  'self.relax.data.pdb[self.run]', etc), the data model significantly
  simplifies.


now following on from the comment above I would suggest that a data
structure  containing a stack of runs be a good idea.. consider a
command that took a run parameter:

def command(run=None):
  self.relax.run.push(run)
  ... do something
  self.relax.run.pop()

now there are some intrinsic problems with this setup (basically it is
far too easy to pop and then degugging really does become a
nightmare.... However, python actually has at least three solutions to
this(not all ow which are available in version 2.4 the with solution
requires 2.5)

1. decorators (python 2.4)
  @relax_command
  def command():
     ...do something

  @relax_command then wraps command in a self.relax.run.push/pop(run) pair


2. define relax_command as a functor and then have a default
relax_command functor that wraps around with a push and a pop

  class relax_command():
     def __init__(self,function):
         self.function=function
     def __call__(self,*args):
        #find run arg and save in local variable and remove from args
        self.relax.push()
        self.function(args)
       self.relax.pop()

3. the with statement (python 2.5)
see 
http://www.dalkescientific.com/writings/diary/archive/2006/08/23/with_statement.html

Some asides

A.  I believe the runs that are passed around in relax are strings
which are then used to lookup data in a map. Why not just have
(runs/pipes) as objects... Then for example the call

self.relax.data[self.relax.run]

above becomes

self.relax.run.data a much more object orientated and encapsulated structure

B. There is a twist here, if relax is a global variable referenced by
everything if you want to run relax in a threaded manner
(multiprocessor machines are becoming more and more popular) then
self.relax poses a problem as we may need a different relax variable
for each processor so the relax variable needs to be acessed from
thread local storage cf
(http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/302088)


  More information about the data model change is given in the message at
  located at https://mail.gna.org/public/relax-devel/2006-05/msg00008.html
  (Message-id:
  <7f080ed10605232038j5036278dg39136d75a05a9904@xxxxxxxxxxxxxx>) and the
  response located at
  https://mail.gna.org/public/relax-devel/2006-05/msg00010.html
  (Message-id:
  <7f080ed10605241912i7c35f574i94f139588c5fa16b@xxxxxxxxxxxxxx>).


  2.3  The pipe concept

  A single run can be thought of as a pipe where data is input, processed,
  or output as user functions are called.  There are different types of
  pipe for different analyses, e.g. a reduced spectral density mapping
  pipe, a model-free pipe, an exponential curve-fitting pipe, etc.  When
  running relax you choose which run (or pipe) you are currently in and
  the 'run.switch()' user function allows you to jump between multiple
  runs (or pipes).  The modification of user functions in which runs are
  combined or branched (which can be thought of as the pipes merging or
  splitting) would be straight forward.  For example the
  'model_selection()' user function currently accepts the following
  arguments:

  model_selection(self, method=None, modsel_run=None, runs=None)

  In this case the 'modsel_run' can be dropped and the results of model
  selection placed into the current run (or pipe).  The 'run' user
  function class could contain the following user functions for pipe
  manipulation:

  run.copy()    # Create a new run (or pipe) with the current contents of
  another run (or pipe).
  run.create()    # Create a new run (or pipe).  Switch to this pipe by
  default.
  run.current()    # Print the current run (or pipe).
  run.delete()    # Delete the given run (or pipe).
  run.delete_all()    # Delete all runs.  Essentially deleting
  'self.relax.data'.


you might want to consider a nullObject here so that if all runs are
deleted you don't crash just raise error messages...

  run.hybridise()    # Fuse two runs (or pipes) into the current run (or
  pipe).  Overlapping data in the two runs must be identical!
  run.list()    # Print all runs (or pipes).
  run.switch()    # Switch to another run (or pipe).


Now here is a further comment if run were an object that contained its
own data many of these processes could be dealt with using pythons own
semantics

e.g.

run.copy():

        from copy import copy

        new_run=copy(run)

run.create():
        new_run = Run()

run.delete():
        new_run = Run()
        new_run =  None # run dissapears due to grbage collection/ref counts


  One evolutionary path of the run concept which could be followed with
  this set of proposed changes is to completely replace it with the pipe
  concept.  All instances of 'run' in relax would be renamed to 'pipe'.
  For example 'run.create()' will become 'pipe.create()',
  'self.relax.data[self.relax.run]' will become
  'self.relax.data[self.relax.pipe]', etc.  I believe that the name 'pipe'
  is a better representation of the run concept than 'run'.  What do you
  think of the idea?

another name would be processor or command


  The hypothetical ideas of this paragraph are not part of the current
  proposals, however they further illustrate the pipe concept.  The pipe
  concept is highly amenable for the creation of a Qt GUI.  Program
  execution could be directed by a graphical 'pipe' construction (possibly
  in 3D using OpenGL).  Elements of the pipe, equivalent to the user
  functions of the prompt and script interfaces, could be dragged from
  toolbars and dropped into a canvas.  These could be linked together by
  moving the element with the mouse and having it click into other
  elements.  For example 'run.delete()' (alternatively 'pipe.delete()')
  could be represented as a cap added to the end of a pipe - its execution
  removes all the data of that pipe from memory.  This pictorial
  representation of execution would be very powerful and intuitive.
  Scripts could be imported into the GUI and represented as a network of
  interconnected pipes and vice versa.  Execution of relax could even be
  animated as semi-transparent pipes filling up bit by bit as each user
  executes.  Imagination is the only limit!

regards
gary

re: Redesign of the relax data model: 2. A new run concept

Header

Content

Related Messages