Re: [bug #21598] Considering molecule numbers when writing pyMol (or Molmol) macros -- December 11, 2014

Hi Edward,

sorry for responding so late – I totally forgot about your message and found 
it now again only by accident.

To answer your question, I think it is a good idea to leave it up to the user 
to number the chains accordingly. In the end, it seems like the user is the 
only one competent enough to judge which name the chains should have – relax 
would have to guess, as you explained, especially if one loads several "A 
chains" from different sources. Some kind of user interaction is necessary. 

Wouldn't be the simples solution to hand over a string array of IDs to the 
pymol macro, which in turn would assign these to the chains? The length of 
the array would have to match the number of chains. Which ID is attributed to 
which chain would be probably up to the order of the loaded molecules. 

It would certainly help already if there was *any* chain ID mentioned in the 
macros, which could be replaced by e.g. sed afterwards, if necessary. Now I 
just add the "and chain A" bit more or less manually for every line.

Cheers,
Martin

On 11.11.2014, at 17:35, Edward d'Auvergne <edward@xxxxxxxxxxxxx> wrote:

Hi Martin,

I've been looking at old bug reports and was wondering if you still
had interest in this PyMOL/MOLMOL macro issue?  Would your problem be
solved if we added the chain ID as an argument to the
pymol.macro_create user function?  If we added software specific
selections to the pymol and molmol user functions, then maybe these
can be re-run after a model-free analysis finishes and macros
including this additional selection information could be manually
created.  Such a solution essentially pushes the structure mapping
problem back onto the user.  What do you think?

Cheers,

Edward



On 10 February 2014 08:48, Edward d'Auvergne <edward@xxxxxxxxxxxxx> wrote:

Hi Martin,

I've been thinking about this one for quite a while and there are some
complications.  The problem is with the mapping of molecules in the
relax data store and the original PDB chain IDs.  For example, if you
start with a PDB file with 3 chains but only load the structure with
chain ID B, then there will only be one structure in the internal
structural object.  Using relax's current logic, this would map to
chain ID A so the result would be a very confusing for user.

For relax to create valid PDB files, the original chain IDs have to be
forgotten (though they are stored in the cdp.structure object).  You
cannot load in a structure with chain ID B and write out its atoms
with chain ID B, because you could load into the relax data store a
second PDB file with a molecule with the chain ID of B as well.
Therefore the current relax logic is that each loaded molecule is
reassigned a chain ID from A to Z (followed by 0-9 and then a-z, as
specified at http://www.wwpdb.org/procedure.html#toc_4) - this is
mandated by the PDB standard.  So if you load in molecules from random
sources, they will be given the IDs A, B, C, etc. when using the
structure.write_pdb user function.  It is important to note that with
the relax structure user functions, you can convert one PDB file with
X models into one model with X structures in relax, or you can convert
X PDB structures into X models of one structure in relax.  This
important ability really complicates the molecule to molecule mapping.

The macros themselves are molecule independent - they will operate on
which ever structure is loaded.  So maybe what is needed is to access
the original chain IDs which are already stored.  This itself is
difficult.  Firstly, you can see the original IDs by looking at the
code.  Have a look at the add_atom() method in the
lib/structure/internal/molecule.py file.  This is called from the
fill_object_from_pdb() method.  In the add_atom() method, you will see
that the original chain IDs are stored as the molecule container
'chain_id' list.  So it might be best for the macros to access these
original IDs rather than using the new IDs, however this is the
problem.

The challenge would be to pull out these original IDs.  For the
model-free analysis, the macro creation is via the module
specific_analyses/model_free/macro_base.py, using the classic_style()
method.  There you will see that for each parameter converted into a
macro, there is a spin_loop() call.  This returns the molecule name.
But note that information in relax is stored in a different place to
the structural data (cdp.mol verses cdp.structure).  This separation
is deliberate as cdp.mol must be able to handle the case of no
structural information and it is also a complex structure holding lots
of NMR information, whereas cdp.structure object is highly optimised
for speed and handling huge quantities of models and/or molecules and
only holds basic structural information.

The problem is that there is no link between cdp.mol and cdp.structure
to from one to the other.  This difficult issue would need to be
solved.  Then that would allow you to follow from the molecule name in
cdp.mol to the molecule in cdp.structure.structural_data[0] (this is
the first model), to the original chain IDs of each atom.  To find the
correct atom, this will also be a challenge as residue names and
numbers and spin names and numbers can be changed.  It may in fact be
easier to change the code generating the macros to use the
cdp.structure.atom_loop() function rather than the spin_loop().  Then
the pipe_control.mol_res_spin.generate_spin_id_unique() function could
be used to create a spin ID to try to match.  This will work quite
well for obtaining the residue and spin info.  But again here you have
the molecule matching problem, as the molecule names in cdp.mol and
cdp.structure are different, and the name in cdp.mol can be changed by
the user at any time.

So I will have to think about this issue even more.  Feel free to look
at the code I mentioned and see if you can see an easy solution!

Regards,

Edward



On 5 February 2014 11:16, Martin Ballaschk
<NO-REPLY.INVALID-ADDRESS@xxxxxxx> wrote:

URL:
 <http://gna.org/bugs/?21598>

                Summary: Considering molecule numbers when writing pyMol 
(or
Molmol) macros
                Project: relax
           Submitted by: mab
           Submitted on: Wed 05 Feb 2014 10:16:04 AM GMT
               Category: relax's source code
Specific analysis category: Model-free analysis
               Priority: 5 - Normal
               Severity: 1 - Wish
                 Status: None
            Assigned to: None
        Originator Name:
       Originator Email:
            Open/Closed: Open
                Release: 3.1.5
        Discussion Lock: Any
       Operating System: All systems

   _______________________________________________________

Details:

Hi Edward.

Relax automatically creates mappings of the various model-free parameters 
onto
the loaded structure by generatiing .pml (or Molmol .mac) scripts/macros. 
To
use them, the PDB structure has to be opened in PyMol, and then the 
mapping
script has to be run.

The problem: when loading the mapping script, all chains of the current
molecule are treated the same, i.e. the values are not only mapped to 
chain A
of my multimer, but also onto chain B, C, etc.

The reason is that these mappings are based on residue numbers only. To 
make
one's life easier, all present molecules should be treated individually.

E.g., Instead of:
select pept_bond, (name ca,n AND resi 2) or (name ca,c AND resi 1)

it should read:
select pept_bond, (name ca,n AND resi 2 AND chain A) or (name ca,c AND 
resi 1
AND chain A)

in the pymol script.

To fix this, one just would have take into account the molecule number 
that is
read by relax:

structure.read_pdb(file='XYZ.pdb', read_mol=1)

"Molecule 1" would translate into "chain A", "molecule 2" to "chain B" , 
etc
in the PyMol script, by looping over the molecules present, assuming all
present molecules have been loaded from the same pdb. If the different
molecules are loaded from different files, the molecule-chain mapping 
would
make little sense. One way to circumvent this problem could be something 
like
a "multimer" flag to tell relax to specify molecule/chain numbers.

I don't know what the scripts would look like if there are several 
molecules
loaded into relax at the same time. If there is no seperate treatment for
them, a fix like this would probably help.

Cheers,
Martin





   _______________________________________________________

Reply to this item at:

 <http://gna.org/bugs/?21598>

_______________________________________________
 Message sent via/by Gna!
 http://gna.org/


_______________________________________________
relax (http://www.nmr-relax.com)

This is the relax-devel mailing list
relax-devel@xxxxxxx

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-devel

Re: [bug #21598] Considering molecule numbers when writing pyMol (or Molmol) macros

Header

Content

Related Messages