Hi,
Actually, it wasn't so bad. Only the bmr4096.str entry has a sequence
mismatch. The residue 65 should either be Asn or Gln - both residue
can be found in the entry. The same with 146. I have assumed the
monomeric_polymer (entity) saveframe is correct and that both should
be Gln. The diff is below.
Cheers,
Edward
[edau@localhost bmr2.1_files_mod1]$ cat diff
diff -ur ./bmr4096.str ../bmr2.1_files/bmr4096.str
--- ./bmr4096.str 2011-01-28 05:56:00.000000000 +0100
+++ ../bmr2.1_files/bmr4096.str 2011-02-03 13:22:27.000000000 +0100
@@ -1031,7 +1031,7 @@
3JHNHA 60 ALA H 60 ALA HA 3.96 0.03
3JHNHA 63 LEU H 63 LEU HA 3.13 0.08
3JHNHA 64 ALA H 64 ALA HA 3.50 0.03
- 3JHNHA 65 ASN H 65 ASN HA 4.94 0.03
+ 3JHNHA 65 GLN H 65 GLN HA 4.94 0.03
3JHNHA 66 ILE H 66 ILE HA 4.18 0.05
3JHNHA 67 GLY H 67 GLY HA 4.82 0.04
3JHNHA 68 VAL H 68 VAL HA 4.36 0.03
@@ -1108,7 +1108,7 @@
3JHNHA 143 SER H 143 SER HA 3.33 0.03
3JHNHA 144 GLY H 144 GLY HA 4.40 0.06
3JHNHA 145 LEU H 145 LEU HA 3.06 0.09
- 3JHNHA 146 ASN H 146 ASN HA 6.99 0.02
+ 3JHNHA 146 GLN H 146 GLN HA 6.99 0.02
3JHNHA 147 SER H 147 SER HA 5.56 0.02
stop_
@@ -1222,7 +1222,7 @@
62 VAL N 0.88 0.03
63 LEU N 0.85 0.03
64 ALA N 0.85 0.02
- 65 ASN N 0.85 0.02
+ 65 GLN N 0.85 0.02
66 ILE N 0.88 0.03
67 GLY N 0.8 0.03
68 VAL N 0.91 0.02
@@ -1302,7 +1302,7 @@
143 SER N 0.86 0.02
144 GLY N 0.91 0.03
145 LEU N 0.88 0.04
- 146 ASN N 0.84 0.02
+ 146 GLN N 0.84 0.02
147 SER N 1 0.01
stop_
@@ -1416,7 +1416,7 @@
62 VAL N 13.3 0.25
63 LEU N 15.64 0.29
64 ALA N 16.38 0.22
- 65 ASN N 15.93 0.25
+ 65 GLN N 15.93 0.25
66 ILE N 14.28 0.29
67 GLY N 15.87 0.3
68 VAL N 16.72 0.25
@@ -1496,7 +1496,7 @@
143 SER N 17.48 0.23
144 GLY N 15.64 0.27
145 LEU N 16.42 0.48
- 146 ASN N 15.59 0.24
+ 146 GLN N 15.59 0.24
147 SER N 12.32 0.08
stop_
@@ -1605,7 +1605,7 @@
62 VAL N 14.24 0.33
63 LEU N 14.94 0.41
64 ALA N 15.47 0.26
- 65 ASN N 15 0.32
+ 65 GLN N 15 0.32
66 ILE N 14.23 0.39
67 GLY N 14.62 0.35
68 VAL N 14.26 0.28
@@ -1684,7 +1684,7 @@
143 SER N 14.75 0.28
144 GLY N 14.06 0.34
145 LEU N 15.19 0.51
- 146 ASN N 15.11 0.34
+ 146 GLN N 15.11 0.34
147 SER N 12.35 0.14
stop_
@@ -1800,7 +1800,7 @@
62 VAL N 0.85 .
63 LEU N 0.86 .
64 ALA N 0.86 .
- 65 ASN N 0.86 .
+ 65 GLN N 0.86 .
66 ILE N 0.85 .
67 GLY N 0.88 .
68 VAL N 0.85 .
@@ -1881,7 +1881,7 @@
143 SER N 0.84 .
144 GLY N 0.85 .
145 LEU N 0.87 .
- 146 ASN N 0.83 .
+ 146 GLN N 0.83 .
147 SER N 0.72 .
stop_
@@ -1981,7 +1981,7 @@
62 VAL N 0.9138 0.0186 . . . . 10.3361 S2
63 LEU N 0.9735 0.0195 . . . . 1.1268 S2
64 ALA N 0.9355 0.0394 . . 1.8352 0.9506 2.772 S2,Rex
- 65 ASN N 0.9732 0.017 . . . . 4.3623 S2
+ 65 GLN N 0.9732 0.017 . . . . 4.3623 S2
66 ILE N 0.9249 0.0197 . . . . 2.8354 S2
67 GLY N 0.9543 0.0187 . . . . 4.6845 S2
68 VAL N 0.9474 0.0158 . . . . 1.5478 S2
@@ -2060,7 +2060,7 @@
143 SER N 0.9746 0.0139 . . . . 3.6387 S2
144 GLY N 0.9384 0.0179 . . . . 0.4807 S2
145 LEU N 0.9867 0.0185 . . . . 3.3007 S2
- 146 ASN N 0.9742 0.0187 . . . . 5.23 S2
+ 146 GLN N 0.9742 0.0187 . . . . 5.23 S2
147 SER N 0.8163 0.0094 23.7407 4.1317 . . 17.4326 S2,te
stop_
@@ -2173,7 +2173,7 @@
62 VAL H 8.68E-07 . . 1.63E-03
63 LEU H . . 1.00E-08 .
64 ALA H 3.90E-07 . . 3.19E-03
- 65 ASN H 3.90E-07 . . 3.19E-03
+ 65 GLN H 3.90E-07 . . 3.19E-03
66 ILE H 1.92E-06 . . 7.71E-04
67 GLY H . . 1.00E-08 .
68 VAL H 5.83E-06 . . 1.88E-04
@@ -2240,7 +2240,7 @@
143 SER H 2.00E-05 . . 7.11E-07
144 GLY H 4.69E-06 . . 2.58E-04
145 LEU H . . 1.00E-08 .
- 146 ASN H 1.38E-03 . . 4.02E-05
+ 146 GLN H 1.38E-03 . . 4.02E-05
147 SER H 1.90E-02 . . 9.51E-04
stop_
@@ -2316,7 +2316,7 @@
61 LYS H 3.87E-02 1.47E+04 .
62 VAL H 1.15E-02 7.95E+05 .
64 ALA H 2.63E-02 4.05E+06 .
- 65 ASN H 4.86E-02 7.48E+06 .
+ 65 GLN H 4.86E-02 7.48E+06 .
66 ILE H 1.29E-02 4.02E+05 .
68 VAL H 1.29E-02 1.33E+05 .
71 SER H 7.16E-02 1.38E+04 .
@@ -2359,7 +2359,7 @@
140 ALA H 6.25E-02 4.10E+04 .
143 SER H 5.83E-02 1.75E+05 .
144 GLY H 1.56E-01 1.99E+06 .
- 146 ASN H 3.02E-02 1.32E+03 .
+ 146 GLN H 3.02E-02 1.32E+03 .
147 SER H 3.22E-03 1.02E+01 .
stop_
On 3 February 2011 13:25, Edward d'Auvergne <edward@xxxxxxxxxxxxx> wrote:
Hi Eldon,
I've now implemented support for reading the molecular_polymer
saveframe of NMR-STAR v2.1. I have assumed that this is equivalent to
the entity saveframe in v3.x. This is pulling out some more sequence
problems in the v2.1 files. I'll send these inconsistencies as a diff
once I have them all sorted out.
Regards,
Edward
On 2 February 2011 14:35, Edward d'Auvergne <edward@xxxxxxxxxxxxx> wrote:
Hi Eldon,
I don't know if this is the best channel for this information. Is
there a BMRB mailing list where it would be better for this
information?
Ok, this is how I have found these inconsistencies. I have used relax
to read in the BMRB NMR-STAR formatted files. This uses bmrblib which
I wrote (http://gna.org/projects/bmrblib/). This library is pretty
close to complete for relaxation data and model-free data, and would
be very easy to extend to handle the entirety of the NMR-STAR
dictionary. It can both read and write valid NMR-STAR formatted files
in versions 2.1, 3.0, and 3.1 (a little debugging might be still
required, and expansion to different revisions such as 2.1.1 is also
possible). This Python library is an abstraction of the underlying
file format. The very low level reading and writing of the STAR
format is handled by Jurgen F. Doreleijers' pystarlib (jurgenfd att
gmail dott com, http://code.google.com/p/pystarlib/).
For reading the entire BMRB model-free data content, I have performed
the following. I have downloaded all of the files from
http://www.bmrb.wisc.edu/search/query_grid/query_1_46.html using the
link http://www.bmrb.wisc.edu/ftp/pub/bmrb/compress/query_1_46.tar.gz.
These are all in the version 2.1 or 2.1.1 format. Then using the
file names, I have downloaded all of the corresponding v3.1 files from
http://www.bmrb.wisc.edu/ftp/pub/bmrb/entry_lists/nmr-star3.1/. It
looks like maybe 30% of the old formatted files have been converted to
the newer format so far. I will write 2 subsequent emails with
explanations of the problems with the version 2.1 files and the 3.1
files separately.
In this mail, I would like to describe general problems. The first is
that pystarlib cannot handle the semi-colon notation in non-free
looping tag categories, e.g.:
loop_
_Vendor.Name
_Vendor.Address
_Vendor.Electronic_address
_Vendor.Entry_ID
_Vendor.Software_ID
'J. Patrick Loria' .
;
http://xbeams.chem.yale.edu/~loria/
patrick.loria@xxxxxxxx
; 15097 1
This is in the v3.1 file bmr15097.str. The basic pystarlib
functionality probably needs to be fixed, assuming this construct is a
valid STAR format. The second is that the bmr4970.str entry is not
parsable. This file has multiple 15N S2_parameters saveframes:
save_S2_parameters_15N_22C
save_S2_parameters_15N_35C
save_S2_parameters_15N_47C
save_S2_parameters_15N_60C
save_S2_parameters_15N_73C
But these all have:
loop_
_Sample_label
$sample_one
stop_
_Sample_conditions_label $sample_conditions_one
They might be the same sample, but the sample conditions are different
as the temperature is changing. By eye, this is obvious, but for the
automatic parsing of this data, the file has to be blacklisted and
skipped.
Cheers,
Edward