mailRe: [bug #21598] Considering molecule numbers when writing pyMol (or Molmol) macros


Others Months | Index by Date | Thread Index
>>   [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Header


Content

Posted by Edward d'Auvergne on February 10, 2014 - 08:48:
Hi Martin,

I've been thinking about this one for quite a while and there are some
complications.  The problem is with the mapping of molecules in the
relax data store and the original PDB chain IDs.  For example, if you
start with a PDB file with 3 chains but only load the structure with
chain ID B, then there will only be one structure in the internal
structural object.  Using relax's current logic, this would map to
chain ID A so the result would be a very confusing for user.

For relax to create valid PDB files, the original chain IDs have to be
forgotten (though they are stored in the cdp.structure object).  You
cannot load in a structure with chain ID B and write out its atoms
with chain ID B, because you could load into the relax data store a
second PDB file with a molecule with the chain ID of B as well.
Therefore the current relax logic is that each loaded molecule is
reassigned a chain ID from A to Z (followed by 0-9 and then a-z, as
specified at http://www.wwpdb.org/procedure.html#toc_4) - this is
mandated by the PDB standard.  So if you load in molecules from random
sources, they will be given the IDs A, B, C, etc. when using the
structure.write_pdb user function.  It is important to note that with
the relax structure user functions, you can convert one PDB file with
X models into one model with X structures in relax, or you can convert
X PDB structures into X models of one structure in relax.  This
important ability really complicates the molecule to molecule mapping.

The macros themselves are molecule independent - they will operate on
which ever structure is loaded.  So maybe what is needed is to access
the original chain IDs which are already stored.  This itself is
difficult.  Firstly, you can see the original IDs by looking at the
code.  Have a look at the add_atom() method in the
lib/structure/internal/molecule.py file.  This is called from the
fill_object_from_pdb() method.  In the add_atom() method, you will see
that the original chain IDs are stored as the molecule container
'chain_id' list.  So it might be best for the macros to access these
original IDs rather than using the new IDs, however this is the
problem.

The challenge would be to pull out these original IDs.  For the
model-free analysis, the macro creation is via the module
specific_analyses/model_free/macro_base.py, using the classic_style()
method.  There you will see that for each parameter converted into a
macro, there is a spin_loop() call.  This returns the molecule name.
But note that information in relax is stored in a different place to
the structural data (cdp.mol verses cdp.structure).  This separation
is deliberate as cdp.mol must be able to handle the case of no
structural information and it is also a complex structure holding lots
of NMR information, whereas cdp.structure object is highly optimised
for speed and handling huge quantities of models and/or molecules and
only holds basic structural information.

The problem is that there is no link between cdp.mol and cdp.structure
to from one to the other.  This difficult issue would need to be
solved.  Then that would allow you to follow from the molecule name in
cdp.mol to the molecule in cdp.structure.structural_data[0] (this is
the first model), to the original chain IDs of each atom.  To find the
correct atom, this will also be a challenge as residue names and
numbers and spin names and numbers can be changed.  It may in fact be
easier to change the code generating the macros to use the
cdp.structure.atom_loop() function rather than the spin_loop().  Then
the pipe_control.mol_res_spin.generate_spin_id_unique() function could
be used to create a spin ID to try to match.  This will work quite
well for obtaining the residue and spin info.  But again here you have
the molecule matching problem, as the molecule names in cdp.mol and
cdp.structure are different, and the name in cdp.mol can be changed by
the user at any time.

So I will have to think about this issue even more.  Feel free to look
at the code I mentioned and see if you can see an easy solution!

Regards,

Edward



On 5 February 2014 11:16, Martin Ballaschk
<NO-REPLY.INVALID-ADDRESS@xxxxxxx> wrote:
URL:
  <http://gna.org/bugs/?21598>

                 Summary: Considering molecule numbers when writing pyMol 
(or
Molmol) macros
                 Project: relax
            Submitted by: mab
            Submitted on: Wed 05 Feb 2014 10:16:04 AM GMT
                Category: relax's source code
Specific analysis category: Model-free analysis
                Priority: 5 - Normal
                Severity: 1 - Wish
                  Status: None
             Assigned to: None
         Originator Name:
        Originator Email:
             Open/Closed: Open
                 Release: 3.1.5
         Discussion Lock: Any
        Operating System: All systems

    _______________________________________________________

Details:

Hi Edward.

Relax automatically creates mappings of the various model-free parameters 
onto
the loaded structure by generatiing .pml (or Molmol .mac) scripts/macros. To
use them, the PDB structure has to be opened in PyMol, and then the mapping
script has to be run.

The problem: when loading the mapping script, all chains of the current
molecule are treated the same, i.e. the values are not only mapped to chain 
A
of my multimer, but also onto chain B, C, etc.

The reason is that these mappings are based on residue numbers only. To make
one's life easier, all present molecules should be treated individually.

E.g., Instead of:
select pept_bond, (name ca,n AND resi 2) or (name ca,c AND resi 1)

it should read:
select pept_bond, (name ca,n AND resi 2 AND chain A) or (name ca,c AND resi 
1
AND chain A)

in the pymol script.

To fix this, one just would have take into account the molecule number that 
is
read by relax:

structure.read_pdb(file='XYZ.pdb', read_mol=1)

"Molecule 1" would translate into "chain A", "molecule 2" to "chain B" , etc
in the PyMol script, by looping over the molecules present, assuming all
present molecules have been loaded from the same pdb. If the different
molecules are loaded from different files, the molecule-chain mapping would
make little sense. One way to circumvent this problem could be something 
like
a "multimer" flag to tell relax to specify molecule/chain numbers.

I don't know what the scripts would look like if there are several molecules
loaded into relax at the same time. If there is no seperate treatment for
them, a fix like this would probably help.

Cheers,
Martin





    _______________________________________________________

Reply to this item at:

  <http://gna.org/bugs/?21598>

_______________________________________________
  Message sent via/by Gna!
  http://gna.org/


_______________________________________________
relax (http://www.nmr-relax.com)

This is the relax-devel mailing list
relax-devel@xxxxxxx

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-devel



Related Messages


Powered by MHonArc, Updated Mon Feb 17 18:20:07 2014