> The spin selection itself is used quite differently by different parts > of the code base and I'm not sure if implementing the parser as a > generator is a good idea. For example the selection string could be > passed to the spin loop function which is a generator yielding the > spin system data container. Using Gary's spin system selection and > generator ideas > (https://mail.gna.org/public/relax-devel/2007-01/msg00014.html, > Message-id: <f001463a0701071417w6bd7927cp8fdd052e698575ec@xxxxxxxxxxxxxx>), > the spin loop presented at > https://mail.gna.org/public/relax-devel/2006-10/msg00057.html > (Message-id: <1160557041.9523.74.camel@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>) > would be simple. One argument goes into the function, the selection > string, and the final line would be a yield statement. In this spin > loop example, maybe it would be useful to have separate > generators/iterators for the molecules, residues, and atoms. Then the > spin loop could become: > > def spin_loop(selection=None): > """Function for selectively looping over all spins.""" > > # Reassign the data container. > data = self.relax.data[self.relax.run] > > # Loop over the molecules. > for mol in data.mol: > # Skip the molecule if there is no match to the selection. > skip = 1 > for mol_name in mol_iterator(selection): > if mol_name == mol.name: > skip = 0 > if skip: > continue > > # Loop over the residues. > for res in mol.res: > # Skip the residue if there is no match to the selection. > skip = 1 > for res_num, res_name in res_iterator(selection): > if res_num == res.num and res_name == res.name: > skip = 0 > if skip: > continue > > # Loop over the spins. > for spin in res.spin: > # Skip the spin if there is no match to the selection. > skip = 1 > for atom_num, atom_name in atom_iterator(selection): > if atom_num == spin.num and atom_name == spin.name: > skip = 0 > if skip: > continue > > # Yield the spin system data container. > yield spin > > > This setup could possibly be more numerically efficient than say: > > def spin_loop(selection=None): > """Function for selectively looping over all spins.""" > > # Reassign the data container. > data = self.relax.data[self.relax.run] > > # Loop over the molecules. > for mol in data.mol: > # Loop over the residues. > for res in mol.res: > # Loop over the spins. > for spin in res.spin: > # Skip the spin if there is no match to the selection. > skip = 1 > for mol_name, res_num, res_name, atom_num, > atom_name in atom_iterator(selection): > if mol_name == mol.name and res_num == res.num > and res_name == res.name and atom_num == spin.num and atom_name == > spin.name: > skip = 0 > if skip: > continue > > # Yield the spin system data container. > yield spin > > > However rather than using a generator for the selection, maybe the > function 'is_selected' could be created: > > def spin_loop(selection=None): > """Function for selectively looping over all spins.""" > > # Reassign the data container. > data = self.relax.data[self.relax.run] > > # Loop over the molecules. > for mol in data.mol: > # Loop over the residues. > for res in mol.res: > # Loop over the spins. > for spin in res.spin: > # Skip the spin if there is no match to the selection. > if not is_selected(selection, mol.name, res.num, > res.name, spin.num, spin.name): > continue > > # Yield the spin system data container. > yield spin > >
This last example seems to be the simplest and most efficient code. However I think yet another possibility might be better here. Rather that looping over all molecules, residues and spins in the data in order to find a selection that might be only a tiny subset of that, why not loop over the selection, then ask whether each selection makes sense in terms of the data? This will be the most efficient approach as long as data > selection, which is likely to be the most common situation.
Assuming we are using a UCSF-like selection syntax, we might code this like:
spin_loop(selection): mol_token, res_token, spin_token = tokenise(selection) if mol_token == None: mol_token = data.mol if res_token == None: res_token = data.res if spin_token == None spin_token = data.spin for mol in parse_token(mol_token): if not mol in data.mol: continue for res in parse_token(res_token): if not res in data.res: continue for spin in parse_token(spin_token): if not spin in data.spin: continue yeild spin
I like this idea, it will be more computationally efficient. I suggest we call the parse_token() functions prior to the loops so that only 3 function calls are made. We should convert the methods of the class 'Selection' from the file 'generic_fns/selection.py' into methods and then add the following functions (please feel free to suggest more): parse_token() tokenise()
Chris or Gary, would you like to add this idea as well to the planning document (with relevant links to the mailing list)? I have a feeling that due to the number of posts on the redesign we may accidentally forget to include one or two of the ideas.
the functions tokenise and parse_token do the work of parsing the selection. tokenise will split on the mol, res, spin identifiers ('#', ':', '@' in UCSF-speek), returning None for tokens without identifiers. parse_token will interpret a string like "2,4,6-10", returning a list [2,4,6,7,8,9,10] (or the equivalent iterator if that is desirable).
I coded these functions a while ago for another purpose, so I could dig them out if necessary.
This is good stuff. Did you code a parser for the Molmol/UCSF identifiers which covers the full syntax? And importantly are there copyright issues (for example do you still own the full copyright to that code and not take parts from or assign it to another entity)?
Edward