mailRe: Redesign of the relax data model: 1. Why change? - 1.2


Others Months | Index by Date | Thread Index
>>   [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Header


Content

Posted by Alexandar Hansen on October 11, 2006 - 15:25:
I have implemented code in C to read pdbs primarily using a string matching approach.  I have run into only a few problems, but it can read most any pdb.  I have an option in it that allows you to select which molecule number (0-9,A-Z) to read in from the pdb in cases where there are more than one.  I haven't modified my code for this for a long time since it works well for what we use it for and it has some issues with certain formats that I know about (ie DNA where each strand has a different number) but it could possibly be a useful starting point for anyone wanting to develop it further and interested in making it comply with all the various weird PDBs.  I have it written as a subroutine and stores the pdb into a structure variable that I created.  Let me know if providing this code would be useful.

Of note, it does NOT require spacings to be perfect and does not care about tabs, unlike a number of programs I have used.

Alex

On 10/11/06, Edward d'Auvergne <edward.dauvergne@xxxxxxxxx> wrote:
On Wed, 2006-10-11 at 17:02 +1000, Edward d'Auvergne wrote:
> This post is proposal for the redesign the relax data model.  This will
> affect how data is input into the program, how data is selected, how
> molecular structures are handled, how spin systems are handled, and how
> many other parts of relax function.  Importantly the internal structure
> of 'self.relax.data' will completely change.  These modifications will
> essentially break every part of relax (the isolated code in the
> directories 'minimise', 'maths_fns', and 'docs' will be safe from the
> carnage, as will a few files in the base directory).  If you have any
> ideas for extending or improving the proposed data model, can see any
> short-comings, deficiencies, or flaws, are familiar with the PDB
> conventions, etc., your input is very much sought after.  The changes
> should occur in the 1.3 line of the repository.  1.2 versions will be
> unaffected - scripts will remain compatible and the 1.2 line will
> continue to be supported with bug fixes, etc.
>
> I have to apologise in advance for the size of this proposal, to
> simplify it I have divided the text into numbered sections.  Once this
> initial parent message has been sent I will respond to it with the text
> of the 4 major sections.  This will allow 4 major threads to branch off
> from this message on the mailing list archive
> ( https://mail.gna.org/public/relax-devel).  If you have an opinion,
> idea, etc. about a specific section, could you please post a separate
> message in response to the relevant major section post?  Also if you
> have unrelated ideas for one of these sections, could you post these as
> separate messages as well?  For example if you have separate points
> about sections 3.1 and 3.5.1, two different posts responding to the
> parent Section 3 post would be appreciated.  Thanks.  This will help to
> focus each discussion point into specific threads.
>
> Edward
>
>
>
> Redesign of the relax data model
>
> Index:
> 1.  Why change?
>     1.1  The runs
>     1.2  The molecules
>     1.3  The residues
>     1.4  The spins
> 2.  A new run concept
>     2.1  Parcelling up an abstract space
>     2.2  The run data model
>     2.3  The pipe concept
> 3.  Molecules, residues, and spins
>     3.1  The spin data model
>     3.2  The data selection concept - identifying spin systems
>         3.2.1  Function arguments
>         3.2.2  NH data of a single protein macromolecule
>         3.2.3  A single organic molecule (non-polymeric)
>         3.2.4  A single RNA or DNA macromolecule
>         3.2.5  Complexes
>     3.3  Regular _expression_
>     3.4  The spin loop
>     3.5  Molecule, sequence, and spin user function classes
>         3.5.1  The 'molecule' user function class
>         3.5.2  The 'sequence' user function class
>         3.5.3  The 'spin' user function class
>     3.6  The input and output files
> 4.  Conclusion



Before reading this post, please read the parent message 'Redesign of
the relax data model:  A HOWTO for breaking relax.' located at
http://https://mail.gna.org/public/relax-devel/2006-10/msg00053.html
(Message-id:
<1160550133.9523.54.camel@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>).


1.  Why change?

It is becoming apparent that the capability and flexibility of the
current relax data model is very limited.  For example posts to both the
relax-users (https://mail.gna.org/public/relax-users) and relax-devel
( https://mail.gna.org/public/relax-devel) mailing lists by Alex Hansen
demonstrate that the data model is not flexible enough to handle
relaxation data from RNA or DNA.  While handling the RNA data in the 1.2
relax versions is possible, the required script would be far too
complex.  I have identified four major categories where I believe that
the relax data model is not flexible or elegant enough:  the runs, the
molecules, the residues, and the spin systems (or atoms).  If you can
think of any other logical categories, please say.  As the planned
changes are so disruptive, fine tuning and perfecting the design now is
very important.


1.1  The runs

The way runs are currently handled in relax is far from optimal.  The
history of the run concept is linked to the history of the program.
relax was originally designed as a small program for the creation of
input files for Art Palmer's Modelfree program, for reading the output
files, and as an implementation of the model selection data analysis
step.  While relax still does all of these things (with Dasha now
supported as well), it has evolved such that it implements all of the
steps of the data analysis chain and can be used as a complete
replacement.  The origin of the run was the Modelfree input and output
file handling function where each model-free model optimised was called
a run.  When the prompt and script based interfaces were implemented and
user functions first created, no user functions were associated with a
'run'.  As relax has evolved, the percentage of user functions
associated with the run concept has gradually risen.  Now almost every
single user function requires a run argument.

The problem with the current run concept is that, because you must
supply the run name to each user function, the run string needs to be
passed to all relevant functions in the program.  For each user function
called this results in the string bouncing around many functions, each
time being placed into 'self.run'.  This dense branching effect means
that the text 'self.run' is extremely widespread within the relax code
base.  For example, by grepping the source code, the number of lines
containing the text 'self.run' in relax version 1.2.7 is 1645!  This
branching approach mandates knowing which functions require the run
argument, passing the string to that function, and then making sure that
the run argument is placed into 'self.run' so that the previous run is
not accidentally accessed.  The current branching run concept is an
important and constant source of bugs.  The complexity of the source
code is unnecessarily high.


1.2  The molecules

In the 1.2 relax versions only a single molecule is supported.  This
molecule must also be the first molecule in the supplied PDB file.
relax cannot currently handle molecular complexes.  For example in PDB
file of an RNA/protein complex there is no way of specifying whether you
are studying the RNA or the protein.


1.3  The residues

Currently relax assumes that you are studying a polymer system in which
the data can be characterised by residue number (and name).  However if
you are studying a molecule that isn't a polymer, there are no residues.
A 'residue number' and 'residue name' could be invented for the system
but, as the residue concept is entrenched into relax, there may be
problems with this approach.


1.4  The spins

relax also assumes that you only have a single relaxation data set per
residue.  For studying the backbone NH relaxation of a protein this
isn't a problem.  However in RNA, DNA, and non-polymeric molecules,
there are likely to be multiple spins studied per 'residue'.  The data
model should be modified to allow the analysis of these types of
molecules.  An additional benefit when studying a protein is that you
would then be able to study the backbone NH data simultaneously with CA
data (together with any other data sets you may have collected).



_______________________________________________
relax ( http://nmr-relax.com)

This is the relax-devel mailing list
relax-devel@xxxxxxx

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-devel




Related Messages


Powered by MHonArc, Updated Thu Oct 12 17:00:26 2006