On Wed, 2006-10-11 at 17:02 +1000, Edward d'Auvergne wrote:
This post is proposal for the redesign the relax data model. This will affect how data is input into the program, how data is selected, how molecular structures are handled, how spin systems are handled, and how many other parts of relax function. Importantly the internal structure of 'self.relax.data' will completely change. These modifications will essentially break every part of relax (the isolated code in the directories 'minimise', 'maths_fns', and 'docs' will be safe from the carnage, as will a few files in the base directory). If you have any ideas for extending or improving the proposed data model, can see any short-comings, deficiencies, or flaws, are familiar with the PDB conventions, etc., your input is very much sought after. The changes should occur in the 1.3 line of the repository. 1.2 versions will be unaffected - scripts will remain compatible and the 1.2 line will continue to be supported with bug fixes, etc. I have to apologise in advance for the size of this proposal, to simplify it I have divided the text into numbered sections. Once this initial parent message has been sent I will respond to it with the text of the 4 major sections. This will allow 4 major threads to branch off from this message on the mailing list archive (https://mail.gna.org/public/relax-devel). If you have an opinion, idea, etc. about a specific section, could you please post a separate message in response to the relevant major section post? Also if you have unrelated ideas for one of these sections, could you post these as separate messages as well? For example if you have separate points about sections 3.1 and 3.5.1, two different posts responding to the parent Section 3 post would be appreciated. Thanks. This will help to focus each discussion point into specific threads. Edward Redesign of the relax data model Index: 1. Why change? 1.1 The runs 1.2 The molecules 1.3 The residues 1.4 The spins 2. A new run concept 2.1 Parcelling up an abstract space 2.2 The run data model 2.3 The pipe concept 3. Molecules, residues, and spins 3.1 The spin data model 3.2 The data selection concept - identifying spin systems 3.2.1 Function arguments 3.2.2 NH data of a single protein macromolecule 3.2.3 A single organic molecule (non-polymeric) 3.2.4 A single RNA or DNA macromolecule 3.2.5 Complexes 3.3 Regular expression 3.4 The spin loop 3.5 Molecule, sequence, and spin user function classes 3.5.1 The 'molecule' user function class 3.5.2 The 'sequence' user function class 3.5.3 The 'spin' user function class 3.6 The input and output files 4. Conclusion
Before reading this post, please read the previous posts: * The parent message 'Redesign of the relax data model: A HOWTO for breaking relax.' located at https://mail.gna.org/public/relax-devel/2006-10/msg00053.html (Message-id: <1160550133.9523.54.camel@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>). * Section 1 'Redesign of the relax data model: 1. Why change?' located at https://mail.gna.org/public/relax-devel/2006-10/msg00054.html (Message-id: <1160551172.9523.60.camel@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>). 2. A new run concept 2.1 Parcelling up an abstract space The general idea is to further increase the prominence of the 'run'. Rather than relax executing in an abstract space where the 'run' is passed into each user function as necessary, the idea is that relax executes within a space dedicated to a certain 'run'. So if at the relax prompt, you could type a user function such as: relax> run.current() 'm8' By working in the 'm8' run space, each user function can be executed without the need for the 'run' argument. Other user functions, such as 'run.switch()', can be used to change between runs. 2.2 The run data model The current run name could be stored in the single data structure 'self.relax.run'. The relax data structure could then be accessed by typing 'self.relax.data[self.relax.run]'. I.e. 'self.relax.data' is a DictType object (it has key-value pairs) in which the run name key is associated with a specific data container. As most data structures in the current relax data model are associated with a run (e.g. 'self.relax.data.diff[self.run]', 'self.relax.data.res[self.run]', 'self.relax.data.pdb[self.run]', etc), the data model significantly simplifies. More information about the data model change is given in the message at located at https://mail.gna.org/public/relax-devel/2006-05/msg00008.html (Message-id: <7f080ed10605232038j5036278dg39136d75a05a9904@xxxxxxxxxxxxxx>) and the response located at https://mail.gna.org/public/relax-devel/2006-05/msg00010.html (Message-id: <7f080ed10605241912i7c35f574i94f139588c5fa16b@xxxxxxxxxxxxxx>). 2.3 The pipe concept A single run can be thought of as a pipe where data is input, processed, or output as user functions are called. There are different types of pipe for different analyses, e.g. a reduced spectral density mapping pipe, a model-free pipe, an exponential curve-fitting pipe, etc. When running relax you choose which run (or pipe) you are currently in and the 'run.switch()' user function allows you to jump between multiple runs (or pipes). The modification of user functions in which runs are combined or branched (which can be thought of as the pipes merging or splitting) would be straight forward. For example the 'model_selection()' user function currently accepts the following arguments: model_selection(self, method=None, modsel_run=None, runs=None) In this case the 'modsel_run' can be dropped and the results of model selection placed into the current run (or pipe). The 'run' user function class could contain the following user functions for pipe manipulation: run.copy() # Create a new run (or pipe) with the current contents of another run (or pipe). run.create() # Create a new run (or pipe). Switch to this pipe by default. run.current() # Print the current run (or pipe). run.delete() # Delete the given run (or pipe). run.delete_all() # Delete all runs. Essentially deleting 'self.relax.data'. run.hybridise() # Fuse two runs (or pipes) into the current run (or pipe). Overlapping data in the two runs must be identical! run.list() # Print all runs (or pipes). run.switch() # Switch to another run (or pipe). One evolutionary path of the run concept which could be followed with this set of proposed changes is to completely replace it with the pipe concept. All instances of 'run' in relax would be renamed to 'pipe'. For example 'run.create()' will become 'pipe.create()', 'self.relax.data[self.relax.run]' will become 'self.relax.data[self.relax.pipe]', etc. I believe that the name 'pipe' is a better representation of the run concept than 'run'. What do you think of the idea? The hypothetical ideas of this paragraph are not part of the current proposals, however they further illustrate the pipe concept. The pipe concept is highly amenable for the creation of a Qt GUI. Program execution could be directed by a graphical 'pipe' construction (possibly in 3D using OpenGL). Elements of the pipe, equivalent to the user functions of the prompt and script interfaces, could be dragged from toolbars and dropped into a canvas. These could be linked together by moving the element with the mouse and having it click into other elements. For example 'run.delete()' (alternatively 'pipe.delete()') could be represented as a cap added to the end of a pipe - its execution removes all the data of that pipe from memory. This pictorial representation of execution would be very powerful and intuitive. Scripts could be imported into the GUI and represented as a network of interconnected pipes and vice versa. Execution of relax could even be animated as semi-transparent pipes filling up bit by bit as each user executes. Imagination is the only limit!