mailRe: r3245 - /1.3/generic_fns/selection.py


Others Months | Index by Date | Thread Index
>>   [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Header


Content

Posted by Chris MacRaild on April 12, 2007 - 17:31:

On 4/12/07, Edward d'Auvergne <edward@xxxxxxxxxxxxx> wrote:
On 4/4/07, Chris MacRaild <c.a.macraild@xxxxxxxxxxx> wrote:
> I've spent a bit of time in the last week or so trying to impliment
> boolean operators in the mol-res-spin selection language. I've come to
> the conclusion that this will not be possible in the current
> implimentation of the spin loop and related functions.

I have been looking at the changes (commit by commit message) and they
look good.  I like the idea of a selection object.  A useful speed up
that is made possible with this structure is that the selection string
could be stored and the __init__() function return without doing
anything if the string passed to it is the same.  This would require
either a singleton or that the code of __init__() be shifted into
another method of the object.


> Consider the selection "#Ap4Aase:4 | #RNA". We mean this to select
> residue 4 of the molecule Ap4Aase, and all residues of the molecule RNA.
> In the current implimentation, however, it selects all residues of both
> molecules. The residue_loop look like:
>
> for mol in data:
>     if not mol in selection_object:
>         continue
>     # both Ap4Aase and RNA get to here; Ap4Aase from the first clause
>     # of the selection, RNA from the second
>     for res in mol.residues:
>         if not res in selection_object:
>             continue
>         yield res
>         # All residues get here, thanks to the second clause of the
>         # selection. Because it doesn't explicitly select residues,
>         # all residues are implicitly selected, and there is no way of
>         # knowing which molecule res belongs to.

We will need a new approach to the spin_loop, residue_loop, and
molecule_loop, that's for sure.


> I see two solutions to the problems I'm running into:
>
> 1) Subtly change the data structure so that each spin 'knows' what
> residue it belongs to, and each residue knows what molecule it belongs
> to. (ie. instances of the SpinContainer class have an attribute residue,
> that is a pointer to the residue instance that contains that spin). Then
> restructure the spin-loop as:
>
> for spin in data.spins:
>     if spin in selection_object:
>         yield spin
>
>
> This has a drawback in terms of efficiency, in that all spins in the
> data structure must be explicitly considered, whereas the current nested
> spin-loop only considers spins that belong to selected residues, and
> only residues of selected spins. I'm not sure how much of a hit this
> will amount to in real situations.

I don't know if I like the idea of an object accessing back through
the dictionary object it belongs to followed by accessing the
namespace of the object holding the dictionary.  There will be better
ways of solving the problem.


> 2) More radically change the implimentation of the spin loop, such that
> it is subsumed into the Selection class. ie. instances of the selection
> class will have a method called spin_loop (and residue_loop, and
> molecule_loop), which returns the equivalent iterator object. Then we
> effectively (though not literally) do the boolean operations on the list
> of selected spins, not on the abstract selection object.
>
>
>
> Clearly option 2 is a more radical departure from the agreed design, but
> it is likely to have better performance characteristics. Any thoughts on
> the best way forward?

I prefer option 2.  However I would prefer another approach so that
someone can just import the spin_loop function from the
'generic_fns.selection' module and utilise it without needing to set
up the Selection object.  It would simplify the API.  Although another
idea would be to implement the Selection.test() function (or a
function with a more appropriate name).  This function could accept
the mol, res, and spin containers and hence have access to all the
info about that spin.  Then the spin_loop would become:

def spin_loop(selection=None):
    """Generator function for looping over all the spin systems of the
given selection.

    @param selection:   The spin system selection identifier.
    @type selection:    str
    @return:            The spin system specific data container.
    @rtype:             instance of the SpinContainer class.
    """

    # Parse the selection string.
    select_obj = Selection(selection)

    # Loop over the molecules.
    for mol in relax_data_store[relax_data_store.current_pipe].mol:
        # Skip the molecule if there is no match to the selection.
        if not select_obj.test(mol):
            continue

        # Loop over the residues.
        for res in mol.res:
            # Skip the residue if there is no match to the selection.
            if not select_obj.test(mol, res):
                continue

            # Loop over the spins.
            for spin in res.spin:
                # Skip the spin if there is no match to the selection.
                if not select_obj.test(mol, res, spin):
                    continue

                # Yield the spin system data container.
                yield spin

The test function could then iterate over each part of the selection
string delineated by boolean operators and brackets and return True or
False as necessary.  What do you think of this idea Chris?

This should be reasonably easy to impliment. A couple of immediate questions: should we ever allow the simpler syntax select_obj.test(spin) (ie. without the mol, res arguments). My feeling is that this will just lead to confusion, and should not be allowed, but perhaps there is an important application for such a call that I've not anticipated? Also, we could impliment this same functionality inside the __contains__ method of the Selection class, using tuples as appropriate. so we would have:

  if (mol, res, spin) in select_obj:
      ...

rather than a seperate test method. This might give a bit more clarity, perhaps?


I'm also in the process of changing labs (and continents), so probably won't have a chance to impliment anything in the next couple of weeks. If the job still needs doing when I'm back on my feet I'll have a hack at it then.


Chris


Regards,

Edward



Related Messages


Powered by MHonArc, Updated Fri Apr 13 16:00:25 2007