Re: Redesign of the relax data model: 3. Molecules, residues, and spins -- January 15, 2007

On Mon, 2007-01-15 at 20:45 +1100, Edward d'Auvergne wrote:

  It will be up to the spin-specific function passed in by the calling
  function to handle the 'spin.select' value.  Because of the 
complexity
  of the loop, the use of this single 'spin_loop()' function will 
simplify
  the relax code base, will minimise potential bugs, and will simplify
  future changes to the relax data model (if necessary).


use of an iterator object will provide flexibility as iterators can be
wrapped filtered and generally mucked about with using pythons loops
and iter tools. Whats more they are  doddle to code as all you do is
write an ordinary function and call yield with a value each time you
have  identified a selected spin
(http://www.python.org/dev/peps/pep-0255/).... This also allows
arbitrary selection to be added as wrapper iterators or filtered
iterators


The UCSF selection syntax is sufficiently powerful for all relax needs,
as well as being simple and well known amongst potential users. It seems
like an excellent alternative to the current spin selection methods.
Coding the parser as an iterator is also a good idea.


Agreed.  Before we go down the path of the Molmol/UCSF syntax, is
there another more intuitive syntax used by other NMR software?

To extend things a bit further, we could incorporate all of this with a
functor similar to that proposed for handling multiple run selections
(https://mail.gna.org/public/relax-devel/2007-01/msg00013.html and
https://mail.gna.org/public/relax-devel/2007-01/msg00020.html ). Of
course the spin functor would opperate at a different level of code to
the run functor - whereas all user functions would be instances of the
run functor, only certain internal functions (those that act on a single
spin) would be instances of the spin functor.


The user functions are instances of the run functor?  Do you mean the
functions called by the user functions are instances?


No. In my interpretation of Gary's initial suggestion (gary:
https://mail.gna.org/public/relax-devel/2007-01/msg00013.html and me:
https://mail.gna.org/public/relax-devel/2007-01/msg00020.html ), each
relax user command is an instance of a functor which catches the runs
argument if passed, impliments the run loop (or run stack in Gary's
implimentation), and calls the relevant function in prompt/

Then all internal relax functions act only on the current run.

Because spin selection is used differently by different relax functions,
and because there is no concept of a 'current spin', my suggestion of a
spin functor lacks the generality and elegance of Gary's idea, but it
might still be worth considering for its code simplification potential.
There are plenty of functions in relax that opperate on a single spin,
and there will be plenty of occasions where we need to code something
like:

for spin in spin_loop(selection):
    do_something(spin)

The spin functor idea is simply that this code can be replaced with:

do_something(selection)

where the 'function' do_something is in fact an instance of the spin
functor:

class Spin_command():
    def __init__(self, function):
        self.function = function
    def __call__(self, *args, **kwds):
        if 'selection' in kwds.keys():
            selection = kwds['selection']
            del kwds['selection']
            for spin in spin_loop(selection):
                args.insert(0, spin)
                # or kwds['spin'] = spin
                # depending on the agreed syntax of function
                self.function(*args, **kwds)

The gain in simplicity is arguably marginal, but there will be a lot of
examples where it might apply.


The spin selection itself is used quite differently by different parts
of the code base and I'm not sure if implementing the parser as a
generator is a good idea.  For example the selection string could be
passed to the spin loop function which is a generator yielding  the
spin system data container.  Using Gary's spin system selection and
generator ideas
(https://mail.gna.org/public/relax-devel/2007-01/msg00014.html,
Message-id: <f001463a0701071417w6bd7927cp8fdd052e698575ec@xxxxxxxxxxxxxx>),
the spin loop presented at
https://mail.gna.org/public/relax-devel/2006-10/msg00057.html
(Message-id: 
<1160557041.9523.74.camel@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>)
would be simple.  One argument goes into the function, the selection
string, and the final line would be a yield statement.  In this spin
loop example, maybe it would be useful to have separate
generators/iterators for the molecules, residues, and atoms.  Then the
spin loop could become:

    def spin_loop(selection=None):
        """Function for selectively looping over all spins."""

        # Reassign the data container.
        data = self.relax.data[self.relax.run]

        # Loop over the molecules.
        for mol in data.mol:
            # Skip the molecule if there is no match to the selection.
            skip = 1
            for mol_name in mol_iterator(selection):
                if mol_name == mol.name:
                    skip = 0
            if skip:
                continue

            # Loop over the residues.
            for res in mol.res:
                # Skip the residue if there is no match to the selection.
                skip = 1
                for res_num, res_name in res_iterator(selection):
                    if res_num == res.num and res_name == res.name:
                        skip = 0
                if skip:
                    continue

                # Loop over the spins.
                for spin in res.spin:
                    # Skip the spin if there is no match to the selection.
                    skip = 1
                    for atom_num, atom_name in atom_iterator(selection):
                        if atom_num == spin.num and atom_name == spin.name:
                            skip = 0
                    if skip:
                        continue

                    # Yield the spin system data container.
                    yield spin


This setup could possibly be more numerically efficient than say:

    def spin_loop(selection=None):
        """Function for selectively looping over all spins."""

        # Reassign the data container.
        data = self.relax.data[self.relax.run]

        # Loop over the molecules.
        for mol in data.mol:
            # Loop over the residues.
            for res in mol.res:
                # Loop over the spins.
                for spin in res.spin:
                    # Skip the spin if there is no match to the selection.
                    skip = 1
                    for mol_name, res_num, res_name, atom_num,
atom_name in atom_iterator(selection):
                        if mol_name == mol.name and res_num == res.num
and res_name == res.name and atom_num == spin.num and atom_name ==
spin.name:
                            skip = 0
                    if skip:
                        continue

                    # Yield the spin system data container.
                    yield spin


However rather than using a generator for the selection, maybe the
function 'is_selected' could be created:

    def spin_loop(selection=None):
        """Function for selectively looping over all spins."""

        # Reassign the data container.
        data = self.relax.data[self.relax.run]

        # Loop over the molecules.
        for mol in data.mol:
            # Loop over the residues.
            for res in mol.res:
                # Loop over the spins.
                for spin in res.spin:
                    # Skip the spin if there is no match to the selection.
                    if not is_selected(selection, mol.name, res.num,
res.name, spin.num, spin.name):
                        continue

                    # Yield the spin system data container.
                    yield spin


This last example seems to be the simplest and most efficient code.
However I think yet another possibility might be better here. Rather
that looping over all molecules, residues and spins in the data in order
to find a selection that might be only a tiny subset of that, why not
loop over the selection, then ask whether each selection makes sense in
terms of the data? This will be the most efficient approach as long as
data > selection, which is likely to be the most common situation.

Assuming we are using a UCSF-like selection syntax, we might code this
like:

spin_loop(selection):
    mol_token, res_token, spin_token = tokenise(selection)
    if mol_token == None:
        mol_token = data.mol
    if res_token == None:
        res_token = data.res
    if spin_token == None
        spin_token = data.spin
    for mol in parse_token(mol_token):
        if not mol in data.mol:
            continue
        for res in parse_token(res_token):
            if not res in data.res:
               continue
            for spin in parse_token(spin_token):
                if not spin in data.spin:
                    continue
                yeild spin

the functions tokenise and parse_token do the work of parsing the
selection. tokenise will split on the mol, res, spin identifiers ('#',
':', '@' in UCSF-speek), returning None for tokens without identifiers.
parse_token will interpret a string like "2,4,6-10", returning a list
[2,4,6,7,8,9,10] (or the equivalent iterator if that is desirable). 

I coded these functions a while ago for another purpose, so I could dig
them out if necessary.

Edward

Re: Redesign of the relax data model: 3. Molecules, residues, and spins

Header

Content

Related Messages