mailRe: relax formatted files for relax reading and writing.


Others Months | Index by Date | Thread Index
>>   [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Header


Content

Posted by Sébastien Morin on December 05, 2008 - 20:19:
Hi Edward,

I agree with you. I had not think about all the other files in relax
since I was so focused on the 'generic' file... but your idea of
implementing this at a higher level is quite logical and would give much
more flexibility to relax. Maybe we could create a branch for this, as
you proposed...

Concerning the generic format, what do you think we should do ? Should
we just introduce variables so the user tells relax in which column
things are..? The user could also tell that he uses the generic format,
as well as how many header lines there are, etc... In fact, the user
could, for now (until we get the automatic recognition stuff working),
tell everything concerning the formatting of his file... What do you think ?

Lots of work ahead !

Cheers,


Séb  :)




Edward d'Auvergne wrote:
Hi,

I think this is a great idea.  But I don't think it's the right time or
place to implement this.  There are many output and input files in relax
formatted with the 5 mol, res, and spin name and number columns.  For
example the sequence data reading and writing, the value reading and
writing, the generic intensity reading (and possibly writing in the
future), the relaxation data reading and writing, the spin deselection
file reading, and the RDC and PCS reading and writing.  So, as you can
see, the idea you propose covers all of these file types and touches a
lot of code.  I think to implement this, we need a new branch.

The way I see this being implemented is quite complex.  I think we
should have one special object, maybe in generic_fns.mol_res_spin, which
parses the file and converts it into other special objects (all
contained within the main object).  There could be a method in this
object called data_loop() which returns the spin_id, and an array with
the data from the remaining columns.  Maybe also a method which returns
the column indices to help deconvolute the data array.  Then this file
parsing object can have the abilities of accepting the mol_name_col,
etc. values to allow header-less or other non-standard formats to be
supported.  If none of the mol, res, and spin name or num values are
given (maybe the user function default), then the text 'mol_name_col',
etc. can be searched for.  And if neither works, then time for a
RelaxError.

This object can also have a write() method to generate these files.  We
just need to create another method to feed in the mol name, res name and
num, spin name and num, and finally the other data to be written.

Obviously if this is implemented (well, actually 'when' is more
appropriate here), then all the code mentioned in the first paragraph
will have to change.  This will require major surgery, although once the
object is in place, the various parts of relax can be converted bit by
bit and separately.  So I think your idea should go into relax, but at a
much higher level.  What do you think of these ideas?

Regards,

Edward


On Fri, 2008-12-05 at 11:56 -0500, Sébastien Morin wrote:
  
Hi Ed,

What about making the code recognize automatically which columns are what ?

We could, for example, have the code determine the number of fields and
then search the header for strings as 'res_num' or 'res_name', etc, and
when all searched fields recognized, assume that the remaining fields
are intensities to extract... The absent fields could be given a default
value such as 'None'. For this, we would need to have the header sent to
the intensity_generic() function (from the autodetect_format() function).

I think this would be great because it could allow users not to input
column numbers and have their files automatically parsed, in whatever
fields the data is...

What do you think of this approach ? Do you see any problem with it ?

Let me know what you think.

Regards,


Séb  :)



Edward d'Auvergne wrote:
    
Sorry, this task of the generic formatted file is far more complicated
than I thought.  It's structure should be modelled after the
generic_fns.value.read() function, as this takes a similarly formatted
file.  Flexibility here is key - any int arguments for the
mol_name_col, res_num_col, res_name_col, spin_num_col, spin_name_col
should be acceptable.  I.e. you can put this information at the end of
the file if you are crazy enough.  But most of the code in
generic_fns.value.read() can be used.  It just needs to be shifted
into functions of generic_fns.spectrum such as
number_of_header_lines() and intensity_generic().

In the future I might write some functions in generic_fns.mol_res_spin
to parse any spin specific but generically formatted file.  But for
now, the generic_fns.value.read() function needs to be mimicked.  This
is an insanely complex task, considering the additional flexibility I
talked about in
https://mail.gna.org/public/relax-devel/2008-12/msg00016.html
especially the automatic reading with the spin specific columns being
allowed to be anywhere.  So if you think this is too much, I can take
over at any point.

Regards,

Edward



On Thu, Dec 4, 2008 at 7:30 PM,  <sebastien.morin.1@xxxxxxxxx> wrote:
  
      
Author: semor
Date: Thu Dec  4 19:30:11 2008
New Revision: 8138

URL: http://svn.gna.org/viewcvs/relax?rev=8138&view=rev
Log:
Modified the autodetection code for the generic format.

This now recognizes the most generic format as in
'test_suite/shared_data/peak_lists/generic_intensity2.txt'.


Modified:
   1.3/generic_fns/spectrum.py

Modified: 1.3/generic_fns/spectrum.py
URL: 
http://svn.gna.org/viewcvs/relax/1.3/generic_fns/spectrum.py?rev=8138&r1=8137&r2=8138&view=diff
==============================================================================
--- 1.3/generic_fns/spectrum.py (original)
+++ 1.3/generic_fns/spectrum.py Thu Dec  4 19:30:11 2008
@@ -254,7 +254,7 @@
            break

    # Generic format.
-    if line[0] in ['mol_name', 'res_num', 'res_name', 'spin_num', 
'spin_name']:
+    if line[0] in ['mol_name', 'res_num', 'res_name', 'spin_num', 
'spin_name'] or line[0] in ['Num', 'Name']:
        return 'generic'

    # Sparky format.


_______________________________________________
relax (http://nmr-relax.com)

This is the relax-commits mailing list
relax-commits@xxxxxxx

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-commits

    
        
_______________________________________________
relax (http://nmr-relax.com)

This is the relax-devel mailing list
relax-devel@xxxxxxx

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-devel

  
      
_______________________________________________
relax (http://nmr-relax.com)

This is the relax-devel mailing list
relax-devel@xxxxxxx

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-devel
    


  




Related Messages


Powered by MHonArc, Updated Fri Dec 05 20:40:08 2008