mailRe: r3245 - /1.3/generic_fns/selection.py


Others Months | Index by Date | Thread Index
>>   [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Header


Content

Posted by Edward d'Auvergne on April 12, 2007 - 17:52:
On 4/13/07, Chris MacRaild <c.macraild@xxxxxxxxxxxxxx> wrote:


On 4/12/07, Edward d'Auvergne <edward@xxxxxxxxxxxxx> wrote:
> On 4/4/07, Chris MacRaild <c.a.macraild@xxxxxxxxxxx> wrote:
> > I've spent a bit of time in the last week or so trying to impliment
> > boolean operators in the mol-res-spin selection language. I've come to
> > the conclusion that this will not be possible in the current
> > implimentation of the spin loop and related functions.
>
> I have been looking at the changes (commit by commit message) and they
> look good.  I like the idea of a selection object.  A useful speed up
> that is made possible with this structure is that the selection string
> could be stored and the __init__() function return without doing
> anything if the string passed to it is the same.  This would require
> either a singleton or that the code of __init__() be shifted into
> another method of the object.
>
>
> > Consider the selection "#Ap4Aase:4 | #RNA". We mean this to select
> > residue 4 of the molecule Ap4Aase, and all residues of the molecule RNA.
> > In the current implimentation, however, it selects all residues of both
> > molecules. The residue_loop look like:
> >
> > for mol in data:
> >     if not mol in selection_object:
> >         continue
> >     # both Ap4Aase and RNA get to here; Ap4Aase from the first clause
> >     # of the selection, RNA from the second
> >     for res in mol.residues:
> >         if not res in selection_object:
> >             continue
> >         yield res
> >         # All residues get here, thanks to the second clause of the
> >         # selection. Because it doesn't explicitly select residues,
> >         # all residues are implicitly selected, and there is no way of
> >         # knowing which molecule res belongs to.
>
> We will need a new approach to the spin_loop, residue_loop, and
> molecule_loop, that's for sure.
>
>
> > I see two solutions to the problems I'm running into:
> >
> > 1) Subtly change the data structure so that each spin 'knows' what
> > residue it belongs to, and each residue knows what molecule it belongs
> > to. (ie. instances of the SpinContainer class have an attribute residue,
> > that is a pointer to the residue instance that contains that spin). Then
> > restructure the spin-loop as:
> >
> > for spin in data.spins:
> >     if spin in selection_object:
> >         yield spin
> >
> >
> > This has a drawback in terms of efficiency, in that all spins in the
> > data structure must be explicitly considered, whereas the current nested
> > spin-loop only considers spins that belong to selected residues, and
> > only residues of selected spins. I'm not sure how much of a hit this
> > will amount to in real situations.
>
> I don't know if I like the idea of an object accessing back through
> the dictionary object it belongs to followed by accessing the
> namespace of the object holding the dictionary.  There will be better
> ways of solving the problem.
>
>
> > 2) More radically change the implimentation of the spin loop, such that
> > it is subsumed into the Selection class. ie. instances of the selection
> > class will have a method called spin_loop (and residue_loop, and
> > molecule_loop), which returns the equivalent iterator object. Then we
> > effectively (though not literally) do the boolean operations on the list
> > of selected spins, not on the abstract selection object.
> >
> >
> >
> > Clearly option 2 is a more radical departure from the agreed design, but
> > it is likely to have better performance characteristics. Any thoughts on
> > the best way forward?
>
> I prefer option 2.  However I would prefer another approach so that
> someone can just import the spin_loop function from the
> 'generic_fns.selection' module and utilise it without needing to set
> up the Selection object.  It would simplify the API.  Although another
> idea would be to implement the Selection.test() function (or a
> function with a more appropriate name).  This function could accept
> the mol, res, and spin containers and hence have access to all the
> info about that spin.  Then the spin_loop would become:
>
> def spin_loop(selection=None):
>     """Generator function for looping over all the spin systems of the
> given selection.
>
>     @param selection:   The spin system selection identifier.
>     @type selection:    str
>     @return:            The spin system specific data container.
>     @rtype:             instance of the SpinContainer class.
>     """
>
>     # Parse the selection string.
>     select_obj = Selection(selection)
>
>     # Loop over the molecules.
>     for mol in
relax_data_store[relax_data_store.current_pipe].mol:
>         # Skip the molecule if there is no match to the selection.
>         if not select_obj.test(mol):
>             continue
>
>         # Loop over the residues.
>         for res in mol.res:
>             # Skip the residue if there is no match to the selection.
>             if not select_obj.test(mol, res):
>                 continue
>
>             # Loop over the spins.
>             for spin in res.spin:
>                 # Skip the spin if there is no match to the selection.
>                 if not select_obj.test(mol, res, spin):
>                     continue
>
>                 # Yield the spin system data container.
>                 yield spin
>
> The test function could then iterate over each part of the selection
> string delineated by boolean operators and brackets and return True or
> False as necessary.  What do you think of this idea Chris?

 This should be reasonably easy to impliment. A couple of immediate
questions: should we ever allow the simpler syntax select_obj.test(spin)
(ie. without the mol, res arguments). My feeling is that this will just lead
to confusion, and should not be allowed, but perhaps there is an important
application for such a call that I've not anticipated?

It shouldn't matter too much which approach is used.  I'm sure a far
fetched use for accepting solely the 'spin' argument could be
conceived, but until someone actually needs it it's not worth the
effort of implementing.  It should be trivial anyway if the need
arises.


Also, we could
impliment this same functionality inside the __contains__ method of the
Selection class, using tuples as appropriate. so we would have:

   if (mol, res, spin) in select_obj:
       ...

 rather than a seperate test method. This might give a bit more clarity,
perhaps?

This would need to be clearly documented in the Selection class
docstring.  I don't think this is clearer as using a function is
fairly obvious.  However I do prefer your idea of the "if (mol, res,
spin) in select_obj" usage.


 I'm also in the process of changing labs (and continents), so probably
won't have a chance to impliment anything in the next couple of weeks. If
the job still needs doing when I'm back on my feet I'll have a hack at it
then.

Maybe we'll bump into each other at some random airport around the
world.  Good luck with the move.  I don't know if I'll have the time
either in the next few weeks.

Bye,

Edward



Related Messages


Powered by MHonArc, Updated Fri Apr 13 16:00:25 2007