Re: Redesign of the relax data model: 3. Molecules, residues, and spins -- January 19, 2007

> The spin selection itself is used quite differently by different parts
> of the code base and I'm not sure if implementing the parser as a
> generator is a good idea.  For example the selection string could be
> passed to the spin loop function which is a generator yielding  the
> spin system data container.  Using Gary's spin system selection and
> generator ideas
> (https://mail.gna.org/public/relax-devel/2007-01/msg00014.html,
> Message-id: <f001463a0701071417w6bd7927cp8fdd052e698575ec@xxxxxxxxxxxxxx>),
> the spin loop presented at
> https://mail.gna.org/public/relax-devel/2006-10/msg00057.html
> (Message-id: 
<1160557041.9523.74.camel@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>)
> would be simple.  One argument goes into the function, the selection
> string, and the final line would be a yield statement.  In this spin
> loop example, maybe it would be useful to have separate
> generators/iterators for the molecules, residues, and atoms.  Then the
> spin loop could become:
>
>     def spin_loop(selection=None):
>         """Function for selectively looping over all spins."""
>
>         # Reassign the data container.
>         data = self.relax.data[self.relax.run]
>
>         # Loop over the molecules.
>         for mol in data.mol:
>             # Skip the molecule if there is no match to the selection.
>             skip = 1
>             for mol_name in mol_iterator(selection):
>                 if mol_name == mol.name:
>                     skip = 0
>             if skip:
>                 continue
>
>             # Loop over the residues.
>             for res in mol.res:
>                 # Skip the residue if there is no match to the selection.
>                 skip = 1
>                 for res_num, res_name in res_iterator(selection):
>                     if res_num == res.num and res_name == res.name:
>                         skip = 0
>                 if skip:
>                     continue
>
>                 # Loop over the spins.
>                 for spin in res.spin:
>                     # Skip the spin if there is no match to the selection.
>                     skip = 1
>                     for atom_num, atom_name in atom_iterator(selection):
>                         if atom_num == spin.num and atom_name == spin.name:
>                             skip = 0
>                     if skip:
>                         continue
>
>                     # Yield the spin system data container.
>                     yield spin
>
>
> This setup could possibly be more numerically efficient than say:
>
>     def spin_loop(selection=None):
>         """Function for selectively looping over all spins."""
>
>         # Reassign the data container.
>         data = self.relax.data[self.relax.run]
>
>         # Loop over the molecules.
>         for mol in data.mol:
>             # Loop over the residues.
>             for res in mol.res:
>                 # Loop over the spins.
>                 for spin in res.spin:
>                     # Skip the spin if there is no match to the selection.
>                     skip = 1
>                     for mol_name, res_num, res_name, atom_num,
> atom_name in atom_iterator(selection):
>                         if mol_name == mol.name and res_num == res.num
> and res_name == res.name and atom_num == spin.num and atom_name ==
> spin.name:
>                             skip = 0
>                     if skip:
>                         continue
>
>                     # Yield the spin system data container.
>                     yield spin
>
>
> However rather than using a generator for the selection, maybe the
> function 'is_selected' could be created:
>
>     def spin_loop(selection=None):
>         """Function for selectively looping over all spins."""
>
>         # Reassign the data container.
>         data = self.relax.data[self.relax.run]
>
>         # Loop over the molecules.
>         for mol in data.mol:
>             # Loop over the residues.
>             for res in mol.res:
>                 # Loop over the spins.
>                 for spin in res.spin:
>                     # Skip the spin if there is no match to the selection.
>                     if not is_selected(selection, mol.name, res.num,
> res.name, spin.num, spin.name):
>                         continue
>
>                     # Yield the spin system data container.
>                     yield spin
>
>

This last example seems to be the simplest and most efficient code.
However I think yet another possibility might be better here. Rather
that looping over all molecules, residues and spins in the data in order
to find a selection that might be only a tiny subset of that, why not
loop over the selection, then ask whether each selection makes sense in
terms of the data? This will be the most efficient approach as long as
data > selection, which is likely to be the most common situation.

Assuming we are using a UCSF-like selection syntax, we might code this
like:

spin_loop(selection):
    mol_token, res_token, spin_token = tokenise(selection)
    if mol_token == None:
        mol_token = data.mol
    if res_token == None:
        res_token = data.res
    if spin_token == None
        spin_token = data.spin
    for mol in parse_token(mol_token):
        if not mol in data.mol:
            continue
        for res in parse_token(res_token):
            if not res in data.res:
               continue
            for spin in parse_token(spin_token):
                if not spin in data.spin:
                    continue
                yeild spin


I like this idea, it will be more computationally efficient.  I
suggest we call the parse_token() functions prior to the loops so that
only 3 function calls are made. We should convert the methods of the
class 'Selection' from the file 'generic_fns/selection.py' into
methods and then add the following functions (please feel free to
suggest more):
   parse_token()
   tokenise()

Chris or Gary, would you like to add this idea as well to the planning
document (with relevant links to the mailing list)?  I have a feeling
that due to the number of posts on the redesign we may accidentally
forget to include one or two of the ideas.

the functions tokenise and parse_token do the work of parsing the
selection. tokenise will split on the mol, res, spin identifiers ('#',
':', '@' in UCSF-speek), returning None for tokens without identifiers.
parse_token will interpret a string like "2,4,6-10", returning a list
[2,4,6,7,8,9,10] (or the equivalent iterator if that is desirable).

I coded these functions a while ago for another purpose, so I could dig
them out if necessary.


This is good stuff.  Did you code a parser for the Molmol/UCSF
identifiers which covers the full syntax?  And importantly are there
copyright issues (for example do you still own the full copyright to
that code and not take parts from or assign it to another entity)?

Edward

Re: Redesign of the relax data model: 3. Molecules, residues, and spins

Header

Content

Related Messages