mailRe: r3245 - /1.3/generic_fns/selection.py


Others Months | Index by Date | Thread Index
>>   [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Header


Content

Posted by Edward d'Auvergne on April 12, 2007 - 09:57:
On 4/4/07, Chris MacRaild <c.a.macraild@xxxxxxxxxxx> wrote:
I've spent a bit of time in the last week or so trying to impliment
boolean operators in the mol-res-spin selection language. I've come to
the conclusion that this will not be possible in the current
implimentation of the spin loop and related functions.

I have been looking at the changes (commit by commit message) and they
look good.  I like the idea of a selection object.  A useful speed up
that is made possible with this structure is that the selection string
could be stored and the __init__() function return without doing
anything if the string passed to it is the same.  This would require
either a singleton or that the code of __init__() be shifted into
another method of the object.


Consider the selection "#Ap4Aase:4 | #RNA". We mean this to select
residue 4 of the molecule Ap4Aase, and all residues of the molecule RNA.
In the current implimentation, however, it selects all residues of both
molecules. The residue_loop look like:

for mol in data:
    if not mol in selection_object:
        continue
    # both Ap4Aase and RNA get to here; Ap4Aase from the first clause
    # of the selection, RNA from the second
    for res in mol.residues:
        if not res in selection_object:
            continue
        yield res
        # All residues get here, thanks to the second clause of the
        # selection. Because it doesn't explicitly select residues,
        # all residues are implicitly selected, and there is no way of
        # knowing which molecule res belongs to.

We will need a new approach to the spin_loop, residue_loop, and
molecule_loop, that's for sure.


I see two solutions to the problems I'm running into:

1) Subtly change the data structure so that each spin 'knows' what
residue it belongs to, and each residue knows what molecule it belongs
to. (ie. instances of the SpinContainer class have an attribute residue,
that is a pointer to the residue instance that contains that spin). Then
restructure the spin-loop as:

for spin in data.spins:
    if spin in selection_object:
        yield spin


This has a drawback in terms of efficiency, in that all spins in the
data structure must be explicitly considered, whereas the current nested
spin-loop only considers spins that belong to selected residues, and
only residues of selected spins. I'm not sure how much of a hit this
will amount to in real situations.

I don't know if I like the idea of an object accessing back through
the dictionary object it belongs to followed by accessing the
namespace of the object holding the dictionary.  There will be better
ways of solving the problem.


2) More radically change the implimentation of the spin loop, such that
it is subsumed into the Selection class. ie. instances of the selection
class will have a method called spin_loop (and residue_loop, and
molecule_loop), which returns the equivalent iterator object. Then we
effectively (though not literally) do the boolean operations on the list
of selected spins, not on the abstract selection object.



Clearly option 2 is a more radical departure from the agreed design, but
it is likely to have better performance characteristics. Any thoughts on
the best way forward?

I prefer option 2.  However I would prefer another approach so that
someone can just import the spin_loop function from the
'generic_fns.selection' module and utilise it without needing to set
up the Selection object.  It would simplify the API.  Although another
idea would be to implement the Selection.test() function (or a
function with a more appropriate name).  This function could accept
the mol, res, and spin containers and hence have access to all the
info about that spin.  Then the spin_loop would become:

def spin_loop(selection=None):
   """Generator function for looping over all the spin systems of the
given selection.

   @param selection:   The spin system selection identifier.
   @type selection:    str
   @return:            The spin system specific data container.
   @rtype:             instance of the SpinContainer class.
   """

   # Parse the selection string.
   select_obj = Selection(selection)

   # Loop over the molecules.
   for mol in relax_data_store[relax_data_store.current_pipe].mol:
       # Skip the molecule if there is no match to the selection.
       if not select_obj.test(mol):
           continue

       # Loop over the residues.
       for res in mol.res:
           # Skip the residue if there is no match to the selection.
           if not select_obj.test(mol, res):
               continue

           # Loop over the spins.
           for spin in res.spin:
               # Skip the spin if there is no match to the selection.
               if not select_obj.test(mol, res, spin):
                   continue

               # Yield the spin system data container.
               yield spin

The test function could then iterate over each part of the selection
string delineated by boolean operators and brackets and return True or
False as necessary.  What do you think of this idea Chris?

Regards,

Edward



Related Messages


Powered by MHonArc, Updated Fri Apr 13 16:00:25 2007